Datastage | Parallel Job Score

At runtime, the Job SCORE can be examined to identify:

1.       Number of UNIX processes generated for a given job and $APT_CONFIG_FILE

2.       Operator combination

3.       Partitioning methods between operators

4.       Framework-inserted components - Including Sorts, Partitioners, and Buffer operators

 

Set $APT_DUMP_SCORE=1 to output the Score to the DataStage job log

 

For each job run, 2 separate Score Dumps are written to the log

  1. First score is actually from the license operator
  2. Second score entry is the actual job score

 

Job scores are divided into two sections

1.       Datasets - partitioning and collecting

2.       Operators - node/operator mapping

 

Example score dump

The following score dump shows a flow with a single data set, which has a hash partitioner, partitioning on key ″a″. It shows three operators: generator, tsort, and peek. Tsort and peek are ″combined″, indicating that they have been optimized into the same process. All the operators in this flow are running on one node.

 

The DataStage Parallel Framework implements a producer-consumer data flow model

Upstream stages (operators or persistent data sets) produce rows that are consumed by downstream stages (operators or data sets)

Partitioning method is associated with producer. Collector method is associated with consumer.  “eCollectAny” is specified for parallel consumers, although no collection occurs!

 

The producer and consumer are separated by the following indicators:

-> Sequential to Sequential

<> Sequential to Parallel

=> Parallel to Parallel (SAME)

#> Parallel to Parallel (not SAME)

>> Parallel to Sequential

> No producer or no consumer

 May also include [pp] notation when Preserve Partitioning flag is set

 

At runtime, the DataStage Parallel Framework can only combine stages (operators) that:

1.       Use the same partitioning method

Repartitioning prevents operator combination between the corresponding producer and consumer stages

                Implicit repartitioning (eg. Sequential operators, node maps) also prevents combination

2.       Are Combinable

 Set automatically within the stage/operator definition

 Set within DataStage Designer: Advanced stage properties


The Lookup stage is a composite operator. Internally it contains more than one component, but to the user it appears to be one stage

1.       LUTCreateImpl - Reads the reference data into memory

2.       LUTProcessImpl - Performs actual lookup processing once reference data has been loaded

At runtime, each internal component is assigned to operators independently

Post a Comment

Previous Post Next Post

Contact Form