Recently, I was working on migrating some of the old Datastage server jobs to parallel jobs and while testing I found that the parallel job was actually running slower than the server job.
I was confused as to what then was the use to convert to parallel.
Well, the answer itself lies in the very basics of Datastage processing logic.
A parallel job takes more start-up time, as multiple processes are generated to support the parallelism.
This affects the performance of jobs that process low volumes of data and these jobs would actually run slower than the corresponding server job.
But this in now way should stop you from converting these into parallel. Server jobs are now outdated and Datastage provides little support for them. And it is always better to progress to the wonderful world of parallel jobs.
In case you have identified the jobs that would run on low volumes in production, one way to decrease the runtime is to decrease the no. of nodes that the particular job runs on, which in turn would reduce the start-up time, thereby improving the overall performance of the job.
For jobs that would process large volumes of data, parallel jobs are always faster inspite of the high start up times because the parallelism speeds up the overall processing of the job.
I was confused as to what then was the use to convert to parallel.
Well, the answer itself lies in the very basics of Datastage processing logic.
A parallel job takes more start-up time, as multiple processes are generated to support the parallelism.
This affects the performance of jobs that process low volumes of data and these jobs would actually run slower than the corresponding server job.
But this in now way should stop you from converting these into parallel. Server jobs are now outdated and Datastage provides little support for them. And it is always better to progress to the wonderful world of parallel jobs.
In case you have identified the jobs that would run on low volumes in production, one way to decrease the runtime is to decrease the no. of nodes that the particular job runs on, which in turn would reduce the start-up time, thereby improving the overall performance of the job.
For jobs that would process large volumes of data, parallel jobs are always faster inspite of the high start up times because the parallelism speeds up the overall processing of the job.
Tags
Datastage
This is correct. Parallel jobs are slower than server jobs for less volume of data.
ReplyDelete