Datastage Interview Questions

1) Define Datastage 8.1 runtime architecture.
2) Define Conductor Node, Section Leader and Players.
3) How do Section Leaders and Players communicate?
4) Lookup uses which kind of memory and join uses which kind of memory?



Answers coming soon !!!


Answers are provided in the comments below -

8 Comments

  1. DS 8.1 run-time architecture comprises of the creation of the OSH script for the DS jobs which is then used to compose a job score along with the configuration file.

    The job score is used to fork processes with communication interconnects for
    data, message and control3. Processing begins after the job score and
    processes are created. Job processing ends when either the last row of data is
    processed by the final operator, a fatal error is encountered by any operator, or
    the job is halted by DataStage Job Control or human intervention such as
    DataStage Director STOP.

    This run-time architecture is very nicely explained in the below IBM Redbook Section 1.3.2

    http://www.redbooks.ibm.com/abstracts/sg247576.html

    ReplyDelete
  2. Conductor is the initial framework process. It creates the Section Leader (SL)
    processes (one per node), consolidates messages to the DataStage log, and
    manages orderly shutdown. The Conductor node has the start-up process.
    The Conductor also communicates with the players

    ReplyDelete
  3.  Section Leader is a process that forks player processes (one per stage) and
    manages up/down communications. SLs communicate between the
    conductor and player processes only. For a given parallel configuration file,
    one section leader will be started for each logical node

    ReplyDelete
  4. Players are the actual processes associated with the stages. It sends stderr
    and stdout to the SL, establishes connections to other players for data flow,
    and cleans up on completion. Each player has to be able to communicate
    with every other player. There are separate communication channels
    (pathways) for control, errors, messages and data. The data channel does
    not go through the section leader/conductor as this would limit scalability.
    Data flows directly from upstream operator to downstream operator.

    ReplyDelete
  5. Using data sorted on the join keys is particularly fast and is also light on memory usage since only the data for a key grouping must be kept in memory during processing.

    Contrast that to the method needed for a lookup, where the complete lookup reference data is held in storage. I won't go into sparse lookups, since those are inherently slow since they necessitate a lookup to the source data for each row.

    When the reference data is small and/or the source data volume is large and (relatively) unsorted then a lookup stage can be significantly faster than a join. The actual point at which a join becomes more efficient than a lookup is not easy to define as it depends upon a number of factors including row byte size, number of key columns, relative size of lookup data, available memory, system load type (IO,disk,CPU,Memory) to name just a few.

    Sorting data where the key is already in some non-random order (partially sorted) is much faster than sorting randomly spread keys. A sort like this will be a blocking stage in processing, causing the whole job pipeline to come to a halt while the sort is performed. The lookup stage doesn't perform blocking of the main data stream.

    This information is taken from the DSXchange post -

    http://www.dsxchange.com/viewtopic.php?t=134495&sid=b383302f7ab05afcc10bfa3fde71199d

    ReplyDelete
  6. Thanks for providing the information on  DataStage Online training. Online training have the benefits of being convenient, flexible and on your own ti

    ReplyDelete
  7. good but i have 1 query about conductor, if I have 3 node then 1 for conductor and other 2 for SL or conductor node also have SL....

    Regards
    vaibhav B

    ReplyDelete
  8. Thanks for sharing the good updates on DataStage over here. Keep updating more updates on interview questions here.

    ReplyDelete
Previous Post Next Post

Contact Form