MVSFORUMS.com

Gerd Hofmans · Beginner Joined: 15 Jan 2016 Posts: 20 Topics: 7

What would be the optimal code to count 2 different values in a file, like :
Input

kolusu · Posted: Fri Apr 15, 2016 10:13 am Post subject:

Gerd Hofmans,

You cannot have multiple key break counts with the SECTIONS. So you need to use the trick of JOINKEYS where you read the same file for both INA and INB and use the second file to generate the key count for your primary key. Once you got the primary key count then you can use the sections to generate the key counters.

Is your data already sorted? If so you don't need to use SORT FIELDS=(1,5,CH,A,6,75,CH,A) on the main task.

Use the following JCL which will give you the desired results.

Gerd Hofmans · Beginner Joined: 15 Jan 2016 Posts: 20 Topics: 7

Thanks Kolusu!
After some testing, i decided to split the operations into 2 steps because in 1 step it was taking too much CPU (it's a verly large file, and i start with splitting it into 30 subfiles of > 90Million records each). I did this :

William Collins · Supermod Joined: 03 Jun 2012 Posts: 437 Topics: 0

You still have the SORT in the main task. All your data is already in that order. Since you have a later BUILD, you could consider F1:1,38,F2:6,4 on the REFORMAT and changing the locations on the BUILD.

It is a pity that your original data is not in sequence. Your sample data shows it in sequence. Is it "somewhat in sequence" ie all the data within key contiguous and in sequence, just the main keys not in key sequence?

kolusu · Posted: Tue Apr 19, 2016 7:48 pm Post subject:

Gerd Hofmans,

As William pointed out, you do NOT need a SORT on the JOINKEYS main task as the data is already sorted out. Also I do not see a point of having the count in PD format. I used the PD format as I was performing a sort and you don't have to.

Since your sort fields are contiguous fields, there is no point in splitting them. Simply sort it as a single field. Also use INREC to reduce the sortwk/memory required to just the data you need.

Use these control cards for COUNT1 step

Gerd Hofmans · Beginner Joined: 15 Jan 2016 Posts: 20 Topics: 7

Hi Kolusu & Bill,
Many thanks for your suggestions (that i carefully implemented).
to answer Bill's question : The input data is not sorted.
Just for the record : Sorting the entire input file uses a lot more CPU then splitting and sorting afterwards. i guess this has to do with the region i submit jobs in. Also, this file contains 2.5 Billion records, and needs to be sorted and counted. in a straightforward fashion, using the entire input file, the sorting and counting uses 31minutes of CPU. Splitting, sorting and counting uses 18 minutes of CPU.
Many thanks and kind regards, Gerd.