Datastage parallel job slower than Datastage server job

Recently, I was working on migrating some of the old Datastage server jobs to parallel jobs and while testing I found that the parallel job was actually running slower than the server job.

I was confused as to what then was the use to convert to parallel.

Well, the answer itself lies in the very basics of Datastage processing logic.

A parallel job takes more start-up time, as multiple processes are generated to support the parallelism.
This affects the performance of jobs that process low volumes of data and these jobs would actually run slower than the corresponding server job.

But this in now way should stop you from converting these into parallel. Server jobs are now outdated and Datastage provides little support for them. And it is always better to progress to the wonderful world of parallel jobs.

In case you have identified the jobs that would run on low volumes in production, one way to decrease the runtime is to decrease the no. of nodes that the particular job runs on, which in turn would reduce the start-up time, thereby improving the overall performance of the job.

For jobs that would process large volumes of data, parallel jobs are always faster inspite of the high start up times because the parallelism speeds up the overall processing of the job.

Datastage parallel job slower than Datastage server job

1 Comments

How to write Complex SQL Queries? Practice with examples | Must do for Interviews !

SQL Query - How to delete duplicates from a table?

SQL Interview Queries on Employee Salary Database - 6 SQL Queries

How to install Python and Jupyter Notebook

Contact Form