Runtime Architecture
Generated OSH and Configuration file are used
to “compose” a job SCORE similar to the way an RDBMS builds a query
optimization plan
1. Identifies
degree of parallelism and node assignment for each operator
2. Inserts
sorts and partitioners as needed to ensure correct results
3. Defines
connection topology (datasets) between adjacent operators
4. Inserts buffer operators to prevent deadlocks
(eg. fork-joins)
5. Defines
number of actual UNIX processes -Where possible, multiple operators
are combined within a single UNIX process to improve performance and optimize
resource requirements
6. Job
SCORE is used to fork UNIX processes with communication interconnects for data,
message, and control. Setting $APT_PM_SHOW_PIDS to show UNIX process IDs in
DataStage log
It is
only after these steps that processing begins. This is the “startup overhead”
of an Enterprise Edition job
Job processing ends when - Last row (end
of data) is processed by final operator in the flow (or) A fatal error
is encountered by any operator (or) Job is halted (SIGINT) by DataStage
Job Control or human intervention (eg. DataStage Director STOP)