Datastage tricky questions

1) Aggregator - Is Sort needed -
Only when the method is defined as Sort. If defined as Hash, no sort needed.
Hash creates a hash map and uses a hashing algorithm, but doesn't provide
any output until the aggragations on all groups are completed.
So, only to be used in cases when the number of groups is small.

2) Lookup - Entire - When to use ? When is Lookup faster than
join/merge?

- Lookup with Entire partitioning is faster on SMP systems because SMPs have shared memory
and entire holds up the whole dataset in all partitions.
Since the memory is shared across CPUs, so the dataset is not copied across
partitions but held once in shared memory and all partitions have access to it.

In MPPs, this will not work because, every server will have its own
memory.

3) Xmer for filtering or Filter -
Xmer is better if filtering is on constant values because Xmer will process in batch
because of C++ coding whereas Filter is best used when filter values are dynamic since it is an
interpreted stage and works row by row.

4) How to identify whether system is SMP or MPP?
By the configuration file. Since, SMP has shared memory and disks,
the fastnames will be same for all the nodes whereas a MPP system will have
different fastnames for different nodes.

1 Comments

  1. This comment has been removed by a blog administrator.

    ReplyDelete
Previous Post Next Post

Contact Form