Posted: Fri Feb 14, 2003 2:05 am Post subject: Extracting Duplicate Entries from a Dataset
Folks,
I have a Dataset with 6 million records(again!), and i have to copy it to a VSAM file. The Key for the VSAM file is the 1st 9 characters, the problem is there are multiple records in the input file with the same 1st 9 characters.
I used REPRO with the REPLACE option and the Output file is done now.
But, i need to get a list of Records which have the same 1st 9 characters, any method through sort etc would be very helpful.
NOTE:
The Whole record is not the same only the key is the same.
Joined: 20 Dec 2002 Posts: 80 Topics: 21 Location: Chicago
Posted: Fri Feb 14, 2003 2:36 am Post subject:
Enigma,
hope this works out for you.....
Code:
//STEP010 EXEC PGM=SORT
//*
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=YOUR INPUT FILE,
// DISP=SHR
//SORTOUT DD DSN=YOUR OUTPUT FILE,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(X,Y),RLSE)
//SYSIN DD *
INREC FIELDS=(1,9,X'001C') $ Include a 1 byte Counter at the end
SORT FIELDS=(1,9,CH,A) $ SORT ON the First 9 bytes
SUM FIELDS=(10,2,PD) $ SUM of the SORT key fields
OUTFIL INCLUDE=(10,2,PD,GT,1), $ Only include those recs that have counter > 1
OUTREC=(1,9) $ Remove the constant at the end
/*
Joined: 26 Nov 2002 Posts: 12375 Topics: 75 Location: San Jose
Posted: Fri Feb 14, 2003 6:54 am Post subject:
theenigma,
The job posted by himesh will give you the unique keys of the file. I think you want the duplicates also in a seperate file.
The following DFSORT/ICETOOL jcl will give you the desired results.if you have syncsort at your shop, then use the second method listed below as the DISCARD option does not work with syncsort.
Now with your include condition on the outfil you are only selecting keys which has a sum value greater than 1 which in this case will be
Code:
ABC 003
IJK 002
The above 2 ofcourse are duplicates, but enigma wants all the dups in one file leaving the first duplicate key in the output file so that he can load it to the vsam file.The following output will be produced my using the methods I posted above
himesh , thanx for the solution , kolusu thanx for an even better solution, your explanation was cool, i wonder how you could spend so much time, helping ppl out.
great work , keep it up!!!
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum