Posted: Sat Aug 28, 2010 10:08 am Post subject: Batch pipes and DFSORT
Hi Experts,
We have a job that has 4 DFSORT steps where each step feeds the one after it. We are trying to quicken the entire process by using batch pipes.
Current job:
DFSORT1
DFSORT2
DFSORT3
DFSORT4
To do so, we have split the 4 steps into 4 separate jobs containing single step.
Job#1
DFSORT1
Job#2
DFSORT2
Job#3
DFSORT3
Job#4
DFSORT4
We observe that all the jobs do not start at the same time. There is a certain wait period or lag after Job#1 starts. I would like to understand why is this happening?
Also, can someone please advise if SORT steps are good candidiates for the use of Batch pipes?
It is not appropriate to post the same question in multiple forums.
As was told before (another forum), using batch pipes is not only NOT a good candidate for this, but it won't do what you want anyway.
Someone needs to look at why there are 4 steps that "feed the next". Is it because no one knew how to design the process so that one read would generate the actually needed output. . .?
Is you explan what the 4 steps accomplish, it is most likely that there is a more efficient way to get to the answer that is actually needed. . . _________________ All the best,
When batch pipes are used, a receiving job must wait until data starts being WRITTEN to the batch pipe before it can start READING data from the batch pipe. Hence, Job#2 cannot start READING until Job#1 starts WRITING, etc.
Your Job #1 is a sort. If a MERGE operation is being done, then WRITING will begin almost immediately. Same with a COPY operation.
But, if SORTING/SUMMING/BUILDING, etc. is required, then there can be a substantial delay between READING and WRITING while the actual SORTING/SUMMING, etc. occurs. SORTing may take several passes and involve a great deal of I/O to SORTWK datasets. The actual WRITING to SORTOUT will not occur until the FINAL sort phase. The amount of the delay will depend on the amount of data being sorted, and the complexity of the sort.
Note: Batch pipe delays are not just a SORT issue - an extensive delay might occur for a batch COBOL/DB2 program, for example, where a long-running DB2 Query must complete before the program can begin to write output data. _________________ A computer once beat me at chess, but it was no match for me at kick boxing.
Thanks so much for all the information shared. Here are the descriptions about the SORT steps in more detail:
1. DFSORT1 - Here a concatenated input of 9 input files (containing more than a million records) is given. The data in the output is sorted on two character fields. The data is sorted in ascending order on one field and descending on the other field.
2. DFSORT2 - Removed duplicates on the first 30 characters of the output dataset from the previous step.
3. DFSORT3 - Sorts on two fields, in ascending order, from the output of previous step. One of the field is character and other is BI type.
4. DFSORT4 - Attached a header and trailer to the output file from the previous step.Here MODS E35 = (BAC060,16384,MOD LR) is used in the control card to do so.
It would help if you provided:
1) The input RECFM and LRECL of the SORTIN file to the DFSORT1 job
2) The SYSIN Sort Statements to the DFSORT1 job
3) The SYSIN Sort Statements to the DFSORT2 job
4) The SYSIN Sort Statements to the DFSORT3 job
5) The SYSIN Sort Statements to the DFSORT4 job
6) A general description of what the E35 mod does - e.g. is it just adding a fixed header and a trailer to the file or is it doing a lot of other work. Is the header/trailer information provided by program constants/computations, or via an input file that the E35 module reads/accepts? Are you computing trailer counts/totals? etc. _________________ A computer once beat me at chess, but it was no match for me at kick boxing.
Currently, it looks like the entire input will be processed 4 times (minus whatever dups). I believe you can accomplish what you want with a single pass of the input file and a single write of the sorted, de-duplicated data. . .
Why was this 4-step approach chosen. . . How long does each of the 4 processes currently run?
My "smaller" files are 8-10 millions records of more than 14k bytes and this type of process typically takes a few minutes (once migrated data is recalled) . . . _________________ All the best,
di
Last edited by papadi on Tue Aug 31, 2010 4:23 pm; edited 1 time in total
That was my thinking as well, papadi. That's why I asked to see the sort control statements from each of the four runs - in order to see if "consolidation" might alleviate the need for at least some of those steps, and perhaps an opportunity to "tune" the steps that do need to be run. _________________ A computer once beat me at chess, but it was no match for me at kick boxing.
This sounds very much like someone heard of a "solution" and went looking for a requirement that would use it.
This type of requirement is quite common (i've seen a few hundred) and it does not require several steps. Many of the systems i've been involved with would not accept this job for promotion because of the high amount of resources it would waste making the redundant copies of the data.
Maybe someone in the IT department also owns the hardware concession for the facility 8) _________________ All the best,
Unless you will NEED the output of JOB#1, JOB#2, and JOB#3 at some later date/time, I beleive that you can combine all 4 jobs into 1 job by using ICETOOL (note: the following is NOT tested, and only represents what I THINK would work for this situation):
The first SELECT combines JOB#1 and JOB#2 into one pass of the input file to do both the SORT on two fields, and the elimination of duplicates based on just the first of those two fields.
The second SELECT combines JOB#3 and JOB#4 into one pass of the intermediate file (originally the output of JOB#2) to do both the SORT and the E35 MODS exit routine.
You will have to add whatever other DDNAMES are required ( e.g. any output DDNAME from your E35 exit). _________________ A computer once beat me at chess, but it was no match for me at kick boxing.
kolusu,
Yes, the second statement should have been SORT, not SELECT. Thanks for the correction.
Just out of curiosity, why did you add the EQUALS parameter to the last sort statement, and remove it from the first sort statement? My impression was that there could be duplicates on 1,30,A AND on 397,26,D, but that it was important to only keep the FIRST record of any such set ( for the sake of the OTHER fields in the record ).
But there was no such requirement to keep the EQUALS order in the third sort (input to the last step ). _________________ A computer once beat me at chess, but it was no match for me at kick boxing.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum