Datastage - Important Environment Variables

Here is a list of some important Datastage Environment Variables - 

APT_BUFFER_FREE_RUN

This environment variable is available in the DataStage Administrator, under the Parallel category. It specifies how much of the available inmemory buffer to consume before the buffer resists. This is expressed as a decimal representing the percentage of Maximum memory buffer size (for example, 0.5 is 50%). When the amount of data in the buffer is less than this value, new data is accepted automatically. When the data exceeds it, the buffer first tries to write some of the data it contains before accepting more. The default value is 50% of the Maximum memory buffer size. You can set it to greater than 100%, in which case the buffer continues to store data up to the indicated multiple of Maximum memory buffer size before writing to disk.

 

APT_BUFFER_MAXIMUM_MEMORY

Sets the default value of Maximum memory buffer size. The default value is 3145728 (3 MB). Specifies the maximum amount of virtual memory, in bytes, used per buffer.

 

APT_BUFFER_MAXIMUM_TIMEOUT

DataStage buffering is self tuning, which can theoretically lead to long  delays between retries. This environment variable specified the maximum wait before a retry in seconds, and is by default set to 1.

 

APT_BUFFERING_POLICY

This environment variable is available in the DataStage Administrator, under the Parallel category. Controls the buffering policy for all virtual data sets in all steps. The variable has the following settings:

􀂄 AUTOMATIC_BUFFERING (default). Buffer a data set only if necessary to prevent a data flow deadlock.

􀂄 FORCE_BUFFERING. Unconditionally buffer all virtual data sets. Note that this can slow down processing considerably.

􀂄 NO_BUFFERING. Do not buffer data sets. This setting can cause data flow deadlock if used inappropriately.

 

APT_DECIMAL_INTERM_PRECISION

Specifies the default maximum precision value for any decimal intermediate variables required in calculations. Default value is 38.

 

APT_DECIMAL_INTERM_SCALE

Specifies the default scale value for any decimal intermediate variables required in calculations. Default value is 10.

 

APT_CONFIG_FILE

Sets the path name of the configuration file. (You may want to include this as a job parameter, so that you can

specify the configuration file at job run time).

 

APT_DISABLE_COMBINATION

Globally disables operator combining. Operator combining is DataStage’s default behavior, in which two or more (in fact any number of) operators within a step are combined into one process where possible. You may need to disable combining to facilitate debugging. Note that disabling combining generates more UNIX processes, and hence requires more system resources and memory. It also disables internal optimizations for job efficiency and run times.

 

APT_EXECUTION_MODE

By default, the execution mode is parallel, with multiple processes. Set this variable to one of the following values to run an application in sequential execution mode:

􀂄 ONE_PROCESS one-process mode

􀂄 MANY_PROCESS many-process mode

􀂄 NO_SERIALIZE many-process mode, without serialization

 


APT_ORCHHOME

Must be set by all DataStage Enterprise Edition users to point to the top-level directory of the DataStage Enterprise Edition installation.


APT_STARTUP_SCRIPT

As part of running an application, DataStage creates a remote shell on all DataStage processing nodes on which the job runs. By default, the remote shell is given the same environment as the shell from which DataStage is invoked. However, you can write an optional startupshell script to modify the shell configuration of one or more processing nodes. If a startup script exists, DataStage runs it on remote shells before running your application. APT_STARTUP_SCRIPT specifies the script to be run. If it is not defined, DataStage searches ./startup.apt, $APT_ORCHHOME/etc/startup.apt and $APT_ORCHHOME/etc/startup, in that order. APT_NO_STARTUP_SCRIPT disables running the startup script.

 

APT_NO_STARTUP_SCRIPT

Prevents DataStage from executing a startup script. By default, this variable is not set, and DataStage runs the startup script. If this variable is set, DataStage ignores the startup script. This may be useful when debugging a startup script. See also APT_STARTUP_SCRIPT.

 

APT_STARTUP_STATUS

Set this to cause messages to be generated as parallel job startup moves from phase to phase. This can be useful as a diagnostic if parallel job startup is failing.

 

APT_MONITOR_SIZE

This environment variable is available in the DataStage Administrator under the Parallel branch. Determines the minimum number of records the DataStage Job Monitor reports. The default is 5000 records.

 

APT_MONITOR_TIME

This environment variable is available in the DataStage Administrator under the Parallel branch. Determines the minimum time interval in seconds for generating monitor information at runtime. The default is 5 seconds. This  variable takes precedence over APT_MONITOR_SIZE.

 

APT_NO_JOBMON

Turn off job monitoring entirely.

 

APT_PM_NO_SHARED_MEMORY

By default, shared memory is used for local connections. If this variable is set, named pipes rather than shared memory are used for local connections. If both APT_PM_NO_NAMED_PIPES and APT_PM_NO_SHARED_MEMORY are set, then TCP sockets are used for local connections.

 

APT_PM_NO_NAMED_PIPES

Specifies not to use named pipes for local connections. Named pipes will still be used in other areas of DataStage, including subprocs and setting up of the shared memory transport protocol in the process manager.

 

APT_RECORD_COUNTS

Causes DataStage to print, for each operator Player, the number of records consumed by getRecord() and produced by putRecord(). Abandoned input records are not necessarily accounted for. Buffer operators do not print this information.

 

APT_NO_PART_INSERTION

DataStage automatically inserts partition components in your application to optimize the performance of the stages in your job. Set this variable to prevent this automatic insertion.

 

APT_NO_SORT_INSERTION

DataStage automatically inserts sort components in your job to optimize the performance of the operators in your data flow. Set this variable to prevent this automatic insertion.


APT_SORT_INSERTION_CHECK_ONLY

When sorts are inserted automatically by DataStage, if this is set, the sorts will just check that the order is correct, they won't actually sort. This is a better alternative to shutting partitioning and sorting off insertion off using APT_NO_PART_INSERTION and APT_NO_SORT_INSERTION.

 

 

APT_DUMP_SCORE

Configures DataStage to print a report showing the operators, processes, and data sets in a running job.

 

APT_PM_PLAYER_MEMORY

Setting this variable causes each player process to report the process heap memory allocation in the job log when returning.

 

APT_PM_PLAYER_TIMING

Setting this variable causes each player process to report its call and return in the job log. The message with the return is annotated with CPU times for the player process.

 

OSH_DUMP

If set, it causes DataStage to put a verbose description of a job in the job log before attempting to execute it.

 

OSH_ECHO

If set, it causes DataStage to echo its job specification to the job log after the shell has expanded all arguments.

 

OSH_EXPLAIN

If set, it causes DataStage to place a terse description of the job in the job log before attempting to run it.

 

OSH_PRINT_SCHEMAS

If set, it causes DataStage to print the record schema of all data sets and the interface schema of all operators in the job log.

 

APT_STRING_PADCHAR

Overrides the pad character of 0x0 (ASCII null), used by default when DataStage extends, or pads, a string field to a fixed length.

 

Post a Comment

Previous Post Next Post

Contact Form