MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Extracting Duplicate Entries from a Dataset

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
theenigma
Beginner


Joined: 09 Jan 2003
Posts: 15
Topics: 6

PostPosted: Fri Feb 14, 2003 2:05 am    Post subject: Extracting Duplicate Entries from a Dataset Reply with quote

Folks,

I have a Dataset with 6 million records(again!), and i have to copy it to a VSAM file. The Key for the VSAM file is the 1st 9 characters, the problem is there are multiple records in the input file with the same 1st 9 characters.
I used REPRO with the REPLACE option and the Output file is done now.

But, i need to get a list of Records which have the same 1st 9 characters, any method through sort etc would be very helpful.

NOTE:

The Whole record is not the same only the key is the same.

thanks
Enigma
Back to top
View user's profile Send private message
Himesh
CICS Forum Moderator
CICS Forum Moderator


Joined: 20 Dec 2002
Posts: 80
Topics: 21
Location: Chicago

PostPosted: Fri Feb 14, 2003 2:36 am    Post subject: Reply with quote

Enigma,

hope this works out for you..... Smile

Code:

//STEP010 EXEC PGM=SORT
//*
//SYSOUT  DD SYSOUT=*
//SORTIN  DD DSN=YOUR INPUT FILE,
//           DISP=SHR
//SORTOUT DD DSN=YOUR OUTPUT FILE,
//           DISP=(NEW,CATLG,DELETE),
//           UNIT=SYSDA,
//           SPACE=(CYL,(X,Y),RLSE)
//SYSIN DD *
  INREC FIELDS=(1,9,X'001C')     $ Include a 1 byte Counter at the end
  SORT FIELDS=(1,9,CH,A)         $ SORT ON the First 9 bytes
  SUM FIELDS=(10,2,PD)           $ SUM of the SORT key fields
  OUTFIL INCLUDE=(10,2,PD,GT,1), $ Only include those recs that have counter > 1
  OUTREC=(1,9)                   $ Remove the constant at the end
/*


regards,
Himesh
Back to top
View user's profile Send private message Yahoo Messenger
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12375
Topics: 75
Location: San Jose

PostPosted: Fri Feb 14, 2003 6:54 am    Post subject: Reply with quote

theenigma,
The job posted by himesh will give you the unique keys of the file. I think you want the duplicates also in a seperate file.

The following DFSORT/ICETOOL jcl will give you the desired results.if you have syncsort at your shop, then use the second method listed below as the DISCARD option does not work with syncsort.

Code:

//STEP0100 EXEC PGM=ICETOOL                   
//*                                           
//TOOLMSG   DD SYSOUT=*                       
//DFSMSG    DD SYSOUT=*                       
//IN        DD DSN=INPUT.FILE,               
//             DISP=SHR                       
//OUT      DD  DSN=OUPUT.UNIQUE KEYS,           
//             DISP=(,NEW,CATLG,DELETE),       
//             UNIT=SYSDA                     
//             SPACE=(CYL,(X,Y),RLSE)
//DUPS     DD  DSN=OUPUT.DUPSKEYS,           
//             DISP=(,NEW,CATLG,DELETE),       
//             UNIT=SYSDA                     
//             SPACE=(CYL,(X,Y),RLSE)       
       
//TOOLIN    DD *                               
  SELECT FROM(IN) TO(OUT) ON(1,9,CH) DISCARD(DUPS) FIRST
/*                                           


Method:2

Code:

//STEP0100 EXEC PGM=SORT
//*
//SYSOUT   DD  SYSOUT=*
//SORTIN   DD  DSN=YOUR INPUT FILE,
//            DISP=SHR
//SORTOUT  DD  DSN=YOUR OUTPUT UNIQUE KEY FILE,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//SORTXSUM DD DSN=YOUR OUTPUT DUPLICATE FILE,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//SYSIN    DD *
  INREC FIELDS=(1,80,C'0001')       $ TOTAL LRECL + CONSTANT OF 1
  SORT FIELDS=(1,9,CH,A)            $ SORT ON KEY FIELD
  SUM FIELDS=(81,4,ZD),XSUM         $ SUM ON CONSTANT
  OUTFIL INCLUDE=(81,4,ZD,GE,1),    $ INCLUDE WHEN THE SUM IS >= 1
  OUTREC=(1,80)                     $ REMOVE THE CONSTANT AT THE END
/*


Hope this helps...

cheers

kolusu

PS: I am moving this topic to the utilities forum.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Himesh
CICS Forum Moderator
CICS Forum Moderator


Joined: 20 Dec 2002
Posts: 80
Topics: 21
Location: Chicago

PostPosted: Fri Feb 14, 2003 7:04 am    Post subject: Reply with quote

Kolusu,

as per Enigma

Quote:

But, i need to get a list of Records which have the same 1st 9 characters,


I was under the assumption, that the OUTPUT file should list all the records that appear more than once.

Hence, i had put down
Quote:

OUTFIL INCLUDE=(10,2,PD,GT,1), $ Only include those recs that have counter > 1


I could not relate this to what you had said.

Quote:

The job posted by himesh will give you the unique keys of the file. I think you want the duplicates also in a seperate file.

Could you please clarify?

Regards,
Himesh
Back to top
View user's profile Send private message Yahoo Messenger
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12375
Topics: 75
Location: San Jose

PostPosted: Fri Feb 14, 2003 7:17 am    Post subject: Reply with quote

Himesh,

When you sum the records only the first key will have the total value of no of duplicates.

for ex: take this input

Code:

ABC
ABC
ABC
DEF
EFG
IJK
IJK


when you sum sort it the output will look like this

Code:

ABC   003
ABC   001
ABC   001
DEF   001
EFG   001
IJK   002
IJK   001


Now with your include condition on the outfil you are only selecting keys which has a sum value greater than 1 which in this case will be
Code:

ABC   003
IJK   002


The above 2 ofcourse are duplicates, but enigma wants all the dups in one file leaving the first duplicate key in the output file so that he can load it to the vsam file.The following output will be produced my using the methods I posted above

ie.
Code:

ABC   003
DEF   001
EFG   001
IJK   002


Now the duplicate file will contain

Code:

ABC   001
ABC   001
IJK   001


Hope this is clear

cheers

kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
coolman
Intermediate


Joined: 03 Jan 2003
Posts: 283
Topics: 27
Location: US

PostPosted: Fri Feb 14, 2003 7:52 am    Post subject: Reply with quote

Kolusu,
Quote:

OUTFIL INCLUDE=(81,4,ZD,GE,1),


Iam afraid the "INCLUDE" statement is not necessary here in this example...
because XSUM itself will retrieve the remaining duplicates...

Cheers,
Coolman.
________
Ford Ranger history


Last edited by coolman on Sat Feb 05, 2011 1:18 am; edited 1 time in total
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Fri Feb 14, 2003 11:22 am    Post subject: Reply with quote

An easy way to "get a list of Records which have the same 1st 9 characters" is to use the OCCUR operator of DFSORT's ICETOOL as follows:

Code:

//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=...  input file
//LIST1 DD SYSOUT=*
//TOOLIN DD *
  OCCUR FROM(IN) LIST(LIST1) ON(1,9,CH) ALLDUPS
/*


Other criteria for OCCUR besides ALLDUPS are NODUPS. HIGHER(n), LOWER(n) and EQUAL(n). You can also use ON(VALCNT) to get a count for each value.

For more information on the OCCUR operator of DFSORT's ICETOOL, see:

http://www.storage.ibm.com/software/sort/mvs/icetool/online/srtmtocc.html
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
theenigma
Beginner


Joined: 09 Jan 2003
Posts: 15
Topics: 6

PostPosted: Sat Feb 15, 2003 7:51 pm    Post subject: Reply with quote

himesh , thanx for the solution , kolusu thanx for an even better solution, your explanation was cool, i wonder how you could spend so much time, helping ppl out.
great work Very Happy , keep it up!!!

This place is proving to be very helpful to me!!!

Frank, coolman, thank you as well!!!

thanks all, thanks world
Enigma
Back to top
View user's profile Send private message
Himesh
CICS Forum Moderator
CICS Forum Moderator


Joined: 20 Dec 2002
Posts: 80
Topics: 21
Location: Chicago

PostPosted: Sat Feb 15, 2003 11:38 pm    Post subject: Reply with quote

Good explanation Kolusu,

I am afraid i thought only about meeting the requirement ( Sad ), and not about "why it was actually needed" [as you had done]. Good work.

regards,
Himesh
Back to top
View user's profile Send private message Yahoo Messenger
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group