Posted: Wed Jun 09, 2004 2:24 am Post subject: Split the Input file
Hi,
I want to achieve the below problem using SORT/ICETOOL Utility.
--I have a input file and I can't predict the number of records in it...It varies 0 - 20k....
I need to split the input file into 5 different files with following requirements::
---- Count the total number of records and divide it by 5 and put accordingly the records in each of 5 different files..
----- I dont want to change the order of records...(As I tried with SPLIT and it changes the order in 5 different files. i.e. it pick 1 record,put in 1st file, 2nd record , put in 2nd file and so on...) My requirement is like...
suppose I have got 100 records as input...i need to put first 20 in first file, another 20 in second file and so on...
---- if the total number of records is not divided equally by 5..then put the extra records in the last file...
Joined: 26 Nov 2002 Posts: 12376 Topics: 75 Location: San Jose
Posted: Wed Jun 09, 2004 5:57 am Post subject:
Rahul,
The following DFSORT/ICETOOL JCL will give you the desired results. You need to have the latest version of DFSORT ( I forgot the PTF) for the horizontal math functiuons to work.If you have syncsort at your shop then change the pgm name synctool. A brief explanation of the Job. The first copy operator takes the input file and creates a record with the total no: of records in the input file.
Then we take this count file and create dynamic control cards to split the record.
The third copy step takes in the dynamic control cards and splitts the file into 5 files.
I did not have a chance to test the job, so bear with me for syntax errors
Joined: 26 Nov 2002 Posts: 12376 Topics: 75 Location: San Jose
Posted: Wed Jun 09, 2004 7:40 am Post subject:
Rahul,
The job is pretty simple. The first copy operator takes in the input file just counts the no: of records in the input and writes out the count to a temp file T1.
The output file name is T1. NODETAIL parm means do not write any of the input records to the output file. The count parm on the trailer1 parm will write out total no: of records in the input file. The parm count will be a 8 byte field with leading zeroes suppressed.
let us say your input file has 27 records, then T1 file will be as follows
Code:
---+----1----+----2---
27
Now we take this count file(t1) and create the dynamic control cards.
Code:
//CTL2CNTL DD *
INREC FIELDS=(1,9,FS,DIV,+5,EDIT=(TTTTTTTT))
OUTFIL FNAMES=CTL3CNTL,
OUTREC=(C' OUTFIL FNAMES=OUT1,ENDREC=',1,8,/,
C' OUTFIL FNAMES=OUT2,STARTREC=',
+1,ADD,1,8,ZD,EDIT=(TTTTTTTT),C',ENDREC=',
+2,MUL,1,8,ZD,EDIT=(TTTTTTTT),/,
C' OUTFIL FNAMES=OUT3,STARTREC=',
+1,ADD,(+2,MUL,1,8,ZD),EDIT=(TTTTTTTT),C',ENDREC=',
+3,MUL,1,8,ZD,EDIT=(TTTTTTTT),/,
C' OUTFIL FNAMES=OUT4,STARTREC=',
+1,ADD,(+3,MUL,1,8,ZD),EDIT=(TTTTTTTT),C',ENDREC=',
+4,MUL,1,8,ZD,EDIT=(TTTTTTTT),/,
C' OUTFIL FNAMES=OUT5,STARTREC=',
+1,ADD,(+4,MUL,1,8,ZD),EDIT=(TTTTTTTT),80:X)
using Inrec fields we first divide the total count by 5 and taking the quotient.
So 27/5 = 5 (ignoring the remainder)
I used the edit mask(EDIT=(TTTTTTTT)) to have the leading zeroes. so the value after inrec processing looks like this
Code:
---+----1----+----2---
0000005
Usually to split the file we use the startrec and endrec parms. So using outrec we create the startrec and endrec for all the output files.
Since the total no: of records is 27 , the first 4 files have 5 records each and the last file will have the rest 7 records.
we are doing the same thing in CTL2. I am generating the control cards as shown above. Since we already have the quotient, it is just using couple of arthimetic operations on the quotient.
For the first file we simply supply the quotient for the endrec parm, as by default the startrec is always 1
The parm '/' is used to write the record as a new line
For the second file we need to add 1 to the quotient for the startrec(5+1) parm and multiply the quotient by 2 for the endrec(2*5) parm.
For the third file we need to add 1 to the product after multiplying the quotient by 2 for the startrec(1+(2*5)) parm and and multiply the quotient by 3 for the endrec(3*5) parm.
For the fourth file we need to add 1 to the product after multiplying the quotient by 3 for the startrec(1+(3*5)) parm and and multiply the quotient by 4 for the endrec(3*5) parm.
For the last file we need to add 1 to the product after multiplying the quotient by 4. we don't need to specify the endrec parm as we want rest of the records in the last file.
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Wed Jun 09, 2004 9:15 am Post subject:
Kolusu,
The DFSORT R14 PTF is UQ90053 (Feb, 2003).
Your job works, but there's an unintentional "trick" in it, that judging from your explanation, you're not aware of.
Because you didn't use REMOVECC, your COUNT value will look like this for 200 records:
Code:
1bbbbb200
b is for a blank. The 1 is the carriage control character - it's followed by the 8 byte count. (COUNT gives an 8 byte count with leading zeros suppressed.) Since you're using 1,9,FS, the 1 will be ignored as long as it's followed by a blank. If there were 20000000 records, the count record would have 120000000 and be misinterpreted. You can fix this either by using REMOVECC and using 1,8,FS, or by using 2,8,FS to skip the carriage control character. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
Posted: Wed Jun 09, 2004 9:20 am Post subject:
Kolusu,
Just a clarification.
Quote:
The parm count will be a 9 byte field with leading zeroes suppressed.
I was under the impression that COUNT will output a 8 digit count value. Please confirm. We have Synscort 1999 version in our shop and when I used the COUNT parm I get a 8 digit value.
Joined: 26 Nov 2002 Posts: 12376 Topics: 75 Location: San Jose
Posted: Wed Jun 09, 2004 9:36 am Post subject:
Frank,
Thanks for pointing out the error. As it was early in the morning , I just wrote it without testing. I am going to edit the post to add the removecc parm and adjust the fields.
Phantom : You are right about the count field being 8 bytes. Thanks for pointing out. I am editing the posts to reflect the change.
Posted: Fri Jun 11, 2004 9:11 am Post subject: use of syncsort instead of icetool/synctool
Hey frank/kosula.. I need to split the file using only syncsort... as per the clients standards, we should not use icetool/synctool even these are products of the same.. is there any way doing this????
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Fri Jun 11, 2004 9:46 am Post subject:
Ram22 wrote
Quote:
Hey frank/kosula.. I need to split the file using only syncsort... as per the clients standards, we should not use icetool/synctool even these are products of the same.. is there any way doing this????
Wow, this must be a new record for the most annoying short post.
It's Kolusu, not kosula!
I'm a DFSORT developer. DFSORT and Syncsort are competitive products. While I'm happy to answer questions on DFSORT/ICETOOL/ICEGENER, please don't expect me to answer questions on Syncsort.
ICETOOL and SYNCTOOL are NOT the same product. ICETOOL is a fully supported, fully documented feature of DFSORT. SYNCTOOL is undocumented, unsupported code in Syncsort.
Please make an effort to be less annoying in your posts.
As Kolusu said, this is a free board where people try to help each other as volunteers. Nobody is obligated to answer your questions at all, let only instantly. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum