- I'm not getting the header record in the output files (as I would expect given the HEADER(1) parm)
- the file is sorted 3 times (if I understand the output in DFSMSG correctly).
I appreciate I could sort the input file first and then use the output from that with ICETOOL copy, but is there any other better way that I don't know about ???
My question is therefore, would it be better (from a performance point of view) to sort the input file first, and then use something like
You're not getting the header in your output, because it does not match your INCLUDE= condition on the OUTFILs.
Yes, it will be sorting the data three times. You've asked it to do that.
I would, if deciding to use DATASORT:
Have one DATASORT, therefore only one USING.
Put the three OUTFILs into the one USING file. Amend as you have suggested to be specific about the header on the INCLUDE, or use OUTFIL reporting featers, and create the header using HEADER1. Also, since GE blank gives all that you want for all the OUTFILs, I'd have OMIT COND= for LT C' '.
If you go the HEADER1 route, just forget the DATASORT altogether. OMIT any existing header along with the LT blanks. Then you don't need DATASORT because there is no header and you generate one new one per file in the OUTFILs with HEADER1 (and REMOVECC).
If there is data on the existing header, rather than just "literals", then the DATASORT is a convenient way to do it. Or you make a dummy low-key for the header on INREC and take it off on OUTREC or OUTFIL.
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Fri Sep 11, 2015 10:01 am Post subject: Re: Sort and split files
misi01 wrote:
My input file needs to be sorted and then split based on values in a specific column. The JCL I have at the moment that seems to work is the following
This works fine(ish) apart from the fact that:-
- I'm not getting the header record in the output files (as I would expect given the HEADER(1) parm)
- the file is sorted 3 times (if I understand the output in DFSMSG correctly).
I appreciate I could sort the input file first and then use the output from that with ICETOOL copy, but is there any other better way that I don't know about ???
misi01,
1. Your understanding of the DATASORT Operator is wrong. Datasort just ensures the header/trailers stay in place and ONLY the data is sorted. It has got nothing to do with INCLUDE/OMIT parameters and splitting into different files.
2. You coded the 3 DATASORT operators and the input file is read thrice and sorted thrice which is a mere waste of resources.
As for alternatives there are a couple of alternatives.
1. Using traditional SORT Have a sequence number at the end of the file in case of an FB input file or right after RDW if the input is VB and using INREC validate the sequence number and append a character 'd' for all records other than the first record.
Now sort the file key + append character and use the same sequence number again to write out the header record to ALL output files using OUTFIL INCLUDE.
2. If you insist on using DATASORT then you need to PUSH the header record to all data records and then write it out as HEADER1 on OUTFIL split. _________________ Kolusu
www.linkedin.com/in/kolusu
Since your header is uniquely identifiable, extending the record by one byte to contain something "low" for the header and high for the data, then SORT on that before your actual key will cause the header to be sorted with the data, but it will always appear first in the output.
If the header only has "literal" data, you can simply create it with HEADER1.
A further possibility would be to have a small step, with STOPAFT=1, which generated a symbol(s)/SYNAMES from the header, and used that on HEADER1 in the processing step.
With any extension, you have to return the record to its original content and size on OUTREC or on each OUTFIL. To use HEADER1 you'll need the REMOVECC on OUTFIL.
Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Sat Sep 12, 2015 3:34 am Post subject: Thank you both
Kolusu. You're quite right, my knowledge of DATASORT is, to put it mildly, sketchy. You have to remember that I "code" sort JCL every 2-3 months or so, so I'm never going to be an expert.
Like many others (?), I tinker with various variations until I get something that hopefully works. When an expert comes along with"why didn't you use this?" the answer is simple; I didn't know I could.
William. Unfortunately, your assumption was wrong (I probably wasn't clear enough) but I do need to ensure all 3 output files are sorted. Thanks again for pointing me in the right direction nonetheless. _________________ Michael
No need to sort data you don't have on the output.
Your three OUTFILs only contain data where 151,1,CH,GE,C' ' (plus the header, which you can add to the INCLUDE COND= because I neglected it, still wondering if it just contains As or something else non-fixed as well).
If your OUTFILs only contain that data, then there is no point in SORTing the data they don't contain, so you can get rid of that data before the SORT, using INCLUDE COND=/OMIT COND=.
The code I suggested includes a SORT statement, so I wasn't suggesting not SORTing your output, just not sorting the stuff which doesn't appear anywhere in your output.
Alternatively, if 151,1,CH,GE,C' ' selects all the records anyway, get rid of the INCLUDE= on the OUTFIL as it is superfluous.
An example. 100,000,000 records have X'00' in that position. Without removing those early, they all get sorted, and all get passed to each of the three OUTFILs where they are dutifully ignored.
Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Sun Sep 13, 2015 4:10 am Post subject:
William. Trying to understand your last append, I'm beginning to wonder if we're talking at cross purposes.
The input file contains a header record.the 3 output files should all be sorted on the specified positions.
File 1 will contain all the sorted records (including header)
File 2 well contain the header plus any records containing A in position 151
File 3 is same as 2 but based on position 151 containing I
Am I misunderstanding you? (Or we each other) _________________ Michael
The data from your first OUTFIL will contain the header plus all the records with A at 151.
The data from your second OUTFIL will contain the header plus all the records with I at 151.
The data from your third OUTFIL will contain the header plus all the records with space or greater at 151.
No OUTFIL dataset will get all records, unless the third OUTFIL contains all records because 151 being GE space gets all records. In which case, you don't need INCLUDE=.
If the third OUTFIL is selecting records (contains LT blank) then it would be best to use INCLUDE COND= which operates before the SORT to not select any records there, before they are sorted, which aren't going to appear on any of the output datasets.
In short, either use INCLUDE COND= for the value on the third OUTFIL (if there are records to ignore) and remove the INCLUDE= on the third OUTFIL, or just remove the INCLUDE= on the third OUTFIL if there are no records to ignore.
If you want all records on an OUTFIL dataset, there is no need to select them all. The default is for all records.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum