Datastage Job Execution | Runtime Architecture

 Runtime Architecture

 

Generated OSH and Configuration file are used to “compose” a job SCORE similar to the way an RDBMS builds a query optimization plan

1.       Identifies degree of parallelism and node assignment for each operator

2.       Inserts sorts and partitioners as needed to ensure correct results

3.       Defines connection topology (datasets) between adjacent operators

4.        Inserts buffer operators to prevent deadlocks (eg. fork-joins)

5.       Defines number of actual UNIX processes -Where possible, multiple operators are combined within a single UNIX process to improve performance and optimize resource requirements

6.       Job SCORE is used to fork UNIX processes with communication interconnects for data, message, and control. Setting $APT_PM_SHOW_PIDS to show UNIX process IDs in DataStage log

It is only after these steps that processing begins. This is the “startup overhead” of an Enterprise Edition job

 

Job processing ends when - Last row (end of data) is processed by final operator in the flow (or) A fatal error is encountered by any operator (or) Job is halted (SIGINT) by DataStage Job Control or human intervention (eg. DataStage Director STOP)

Post a Comment

Previous Post Next Post

Contact Form