MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Datasort and split files while retaining the Header record

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Fri Sep 11, 2015 4:35 am    Post subject: Datasort and split files while retaining the Header record Reply with quote

My input file needs to be sorted and then split based on values in a specific column. The JCL I have at the moment that seems to work is the following
Code:

//OUT1     DD DSN=out1 ...
//*                                                     
//OUT2     DD DSN=out2 ...
//*                                                     
//OUT3     DD DSN=out3 ...
//*
//TOOLIN   DD *                                         
 DATASORT FROM(Q264901) TO(OUT1) HEADER(1) USING(ICE1) 
 DATASORT FROM(Q264901) TO(OUT2) HEADER(1) USING(ICE2) 
 DATASORT FROM(Q264901) TO(OUT3) HEADER(1) USING(ICE3) 
//ICE1CNTL DD *                                         
  SORT FIELDS=(1,68,A),FORMAT=BI                       
  OUTFIL FNAMES=OUT1,INCLUDE=(151,1,CH,EQ,C'A')         
//ICE2CNTL DD *                                         
  SORT FIELDS=(1,68,A),FORMAT=BI                       
  OUTFIL FNAMES=OUT2,INCLUDE=(151,1,CH,EQ,C'I')         
//ICE3CNTL DD *                                         
  SORT FIELDS=(1,68,A),FORMAT=BI                       
  OUTFIL FNAMES=OUT3,INCLUDE=(151,1,CH,GE,C' ')         
/*                                                     


This works fine(ish) apart from the fact that:-

- I'm not getting the header record in the output files (as I would expect given the HEADER(1) parm)
- the file is sorted 3 times (if I understand the output in DFSMSG correctly).

I appreciate I could sort the input file first and then use the output from that with ICETOOL copy, but is there any other better way that I don't know about ???

My question is therefore, would it be better (from a performance point of view) to sort the input file first, and then use something like
Code:

//TOOLIN   DD *                                         
 COPY  FROM(Q264901) TO(OUT1) USING(ICE1)               
 COPY  FROM(Q264901) TO(OUT2) USING(ICE2)               
 COPY  FROM(Q264901) TO(OUT3) USING(ICE3)               
//ICE1CNTL DD *                                         
  OUTFIL FNAMES=OUT1,INCLUDE=(151,1,CH,EQ,C'A',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA') 
//ICE2CNTL DD *                                         
  OUTFIL FNAMES=OUT2,INCLUDE=(151,1,CH,EQ,C'I',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA') 
//ICE3CNTL DD *                                         
  OUTFIL FNAMES=OUT3,INCLUDE=(151,1,CH,GE,C' ',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA') 

(the header record contains loads of character A for some strange reason)
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Fri Sep 11, 2015 5:07 am    Post subject: Reply with quote

You're not getting the header in your output, because it does not match your INCLUDE= condition on the OUTFILs.

Yes, it will be sorting the data three times. You've asked it to do that.

I would, if deciding to use DATASORT:

Have one DATASORT, therefore only one USING.

Put the three OUTFILs into the one USING file. Amend as you have suggested to be specific about the header on the INCLUDE, or use OUTFIL reporting featers, and create the header using HEADER1. Also, since GE blank gives all that you want for all the OUTFILs, I'd have OMIT COND= for LT C' '.

If you go the HEADER1 route, just forget the DATASORT altogether. OMIT any existing header along with the LT blanks. Then you don't need DATASORT because there is no header and you generate one new one per file in the OUTFILs with HEADER1 (and REMOVECC).

If there is data on the existing header, rather than just "literals", then the DATASORT is a convenient way to do it. Or you make a dummy low-key for the header on INREC and take it off on OUTREC or OUTFIL.
Back to top
View user's profile Send private message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Fri Sep 11, 2015 9:07 am    Post subject: Brilliant !!! Reply with quote

I'm assuming this is what you meant (at least the output files LOOK correct)
Code:

//TOOLIN   DD *                                           
 DATASORT FROM(Q264901) TO(OUT1) HEADER(1) USING(ICE1)   
//ICE1CNTL DD *                                           
  SORT FIELDS=(1,68,A),FORMAT=BI                         
  OUTFIL FNAMES=OUT1,INCLUDE=(151,1,CH,EQ,C'A',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA')   
  OUTFIL FNAMES=OUT2,INCLUDE=(151,1,CH,EQ,C'I',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA')   
  OUTFIL FNAMES=OUT3,INCLUDE=(151,1,CH,GE,C' ',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA')   
/*                                                       


Originally, I tried using TO(OUT1,OUT2,OUT3) which I've seen as examples in other scenarios which I though was logical but that failed.

Anyway, thanks again
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Fri Sep 11, 2015 10:01 am    Post subject: Re: Sort and split files Reply with quote

misi01 wrote:
My input file needs to be sorted and then split based on values in a specific column. The JCL I have at the moment that seems to work is the following
This works fine(ish) apart from the fact that:-

- I'm not getting the header record in the output files (as I would expect given the HEADER(1) parm)
- the file is sorted 3 times (if I understand the output in DFSMSG correctly).

I appreciate I could sort the input file first and then use the output from that with ICETOOL copy, but is there any other better way that I don't know about ???


misi01,

1. Your understanding of the DATASORT Operator is wrong. Datasort just ensures the header/trailers stay in place and ONLY the data is sorted. It has got nothing to do with INCLUDE/OMIT parameters and splitting into different files.

2. You coded the 3 DATASORT operators and the input file is read thrice and sorted thrice which is a mere waste of resources.

As for alternatives there are a couple of alternatives.

1. Using traditional SORT Have a sequence number at the end of the file in case of an FB input file or right after RDW if the input is VB and using INREC validate the sequence number and append a character 'd' for all records other than the first record.

Now sort the file key + append character and use the same sequence number again to write out the header record to ALL output files using OUTFIL INCLUDE.

2. If you insist on using DATASORT then you need to PUSH the header record to all data records and then write it out as HEADER1 on OUTFIL split.
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Fri Sep 11, 2015 10:39 am    Post subject: Reply with quote

Code:
//TOOLIN   DD *                                           
 DATASORT FROM(Q264901) TO(OUT1) HEADER(1) USING(ICE1)   
//ICE1CNTL DD *                                           
  SORT FIELDS=(1,68,A),FORMAT=BI                         
  OUTFIL FNAMES=OUT1,INCLUDE=(151,1,CH,EQ,C'A',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA')   
  OUTFIL FNAMES=OUT2,INCLUDE=(151,1,CH,EQ,C'I',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA')   
  OUTFIL FNAMES=OUT3,INCLUDE=(151,1,CH,GE,C' ',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA')   
/*


Assuming the GE,C' ' actually selects records, I'd do it on an INCLUDE COND= up front. No need to sort data you don't have on the output.

Code:
  INCLUDE COND=(151,1,CH,GE,C' ')
  SORT FIELDS=(1,68,A),FORMAT=BI                         
  OUTFIL FNAMES=OUT1,INCLUDE=(151,1,CH,EQ,C'A',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA')   
  OUTFIL FNAMES=OUT2,INCLUDE=(151,1,CH,EQ,C'I',OR,       
                              1,10,CH,EQ,C'AAAAAAAAAA')   
  OUTFIL FNAMES=OUT3


Since your header is uniquely identifiable, extending the record by one byte to contain something "low" for the header and high for the data, then SORT on that before your actual key will cause the header to be sorted with the data, but it will always appear first in the output.

If the header only has "literal" data, you can simply create it with HEADER1.

A further possibility would be to have a small step, with STOPAFT=1, which generated a symbol(s)/SYNAMES from the header, and used that on HEADER1 in the processing step.

With any extension, you have to return the record to its original content and size on OUTREC or on each OUTFIL. To use HEADER1 you'll need the REMOVECC on OUTFIL.
Back to top
View user's profile Send private message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Sat Sep 12, 2015 3:34 am    Post subject: Thank you both Reply with quote

Kolusu. You're quite right, my knowledge of DATASORT is, to put it mildly, sketchy. You have to remember that I "code" sort JCL every 2-3 months or so, so I'm never going to be an expert.
Like many others (?), I tinker with various variations until I get something that hopefully works. When an expert comes along with"why didn't you use this?" the answer is simple; I didn't know I could.

William. Unfortunately, your assumption was wrong (I probably wasn't clear enough) but I do need to ensure all 3 output files are sorted. Thanks again for pointing me in the right direction nonetheless.
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Sat Sep 12, 2015 5:13 am    Post subject: Reply with quote

Quote:
No need to sort data you don't have on the output.


Your three OUTFILs only contain data where 151,1,CH,GE,C' ' (plus the header, which you can add to the INCLUDE COND= because I neglected it, still wondering if it just contains As or something else non-fixed as well).

If your OUTFILs only contain that data, then there is no point in SORTing the data they don't contain, so you can get rid of that data before the SORT, using INCLUDE COND=/OMIT COND=.

The code I suggested includes a SORT statement, so I wasn't suggesting not SORTing your output, just not sorting the stuff which doesn't appear anywhere in your output.

Alternatively, if 151,1,CH,GE,C' ' selects all the records anyway, get rid of the INCLUDE= on the OUTFIL as it is superfluous.

An example. 100,000,000 records have X'00' in that position. Without removing those early, they all get sorted, and all get passed to each of the three OUTFILs where they are dutifully ignored.
Back to top
View user's profile Send private message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Sun Sep 13, 2015 4:10 am    Post subject: Reply with quote

William. Trying to understand your last append, I'm beginning to wonder if we're talking at cross purposes.

The input file contains a header record.the 3 output files should all be sorted on the specified positions.

File 1 will contain all the sorted records (including header)
File 2 well contain the header plus any records containing A in position 151
File 3 is same as 2 but based on position 151 containing I

Am I misunderstanding you? (Or we each other)
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Sun Sep 13, 2015 8:47 am    Post subject: Reply with quote

SORT is sorting your input file.

The sorted output is presented to each OUTFIL.

The data from your first OUTFIL will contain the header plus all the records with A at 151.

The data from your second OUTFIL will contain the header plus all the records with I at 151.

The data from your third OUTFIL will contain the header plus all the records with space or greater at 151.

No OUTFIL dataset will get all records, unless the third OUTFIL contains all records because 151 being GE space gets all records. In which case, you don't need INCLUDE=.

If the third OUTFIL is selecting records (contains LT blank) then it would be best to use INCLUDE COND= which operates before the SORT to not select any records there, before they are sorted, which aren't going to appear on any of the output datasets.

In short, either use INCLUDE COND= for the value on the third OUTFIL (if there are records to ignore) and remove the INCLUDE= on the third OUTFIL, or just remove the INCLUDE= on the third OUTFIL if there are no records to ignore.

If you want all records on an OUTFIL dataset, there is no need to select them all. The default is for all records.
Back to top
View user's profile Send private message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Mon Sep 14, 2015 1:02 am    Post subject: Reply with quote

Aha, now I'm with you. Thanks
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group