MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Eliminate All Duplicates in a file
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Thu Nov 28, 2002 12:00 pm    Post subject: Eliminate All Duplicates in a file Reply with quote

How do you eliminate duplicates if you have data shown as below?

Code:


Input file:

111111111
111111111
222222222
333333333
333333333
444444444
444444444
444444444
444444444
555555555

Output file:

222222222
555555555


Solution:

The following DFSORT/ICETOOL Jcl will do the trick for you.If you have syncsort at your shop then change the PGM name to Synctool.

Code:

//STEP0100 EXEC PGM=ICETOOL
//*
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*       
//IN       DD DSN=Input DSN with Dups,DISP=SHR
//OUT      DD DSN=xxxxxx.UNIQUE.DATA,
//            DISP=(NEW,CATLG,DELETE),
//            SPACE=(CYL,(50,25),RLSE),
//            UNIT=SYSDA
//TOOLIN   DD *
  SELECT FROM(IN) TO(OUT) ON(1,9,CH) NODUPS
/*


For more information on DFSORT's ICETOOL, see:

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICE1CA10/6.0?DT=20050120082820
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Himesh
CICS Forum Moderator
CICS Forum Moderator


Joined: 20 Dec 2002
Posts: 80
Topics: 21
Location: Chicago

PostPosted: Thu Dec 26, 2002 2:47 am    Post subject: Reply with quote

For people more familiar with SORT

Code:

//STEP010  EXEC PGM=SORT                             
//SORTIN   DD  DSN=file1,DISP=SHR       
//SORTOUT  DD  DSN=&&TEMP,DISP=(NEW,PASS,DELETE),         
//         SPACE=(TRK,(2,2)),DCB=(LRECL=80,RECFM=FB)
//SYSPRINT DD  SYSOUT=A                             
//SYSOUT   DD  SYSOUT=*                             
//SYSIN    DD  *                                     
  INREC FIELDS=(1,9,X'001C')                         
  SORT FIELDS=(1,9,A),FORMAT=CH                     
  SUM  FIELDS=(10,2,PD)                             
/*                                                   
//STEP020  EXEC PGM=SORT                             
//SORTIN   DD  DSN=&&TEMP,DISP=SHR                   
//SORTOUT  DD  DSN=file2,,DISP=(NEW,CATLG),         
//         SPACE=(TRK,(2,2)),DCB=(LRECL=80,RECFM=FB)
//SYSPRINT DD  SYSOUT=A                             
//SYSOUT   DD  SYSOUT=*                             
//SYSIN    DD  *                                     
  SORT FIELDS=(1,9,A),FORMAT=CH                     
  INCLUDE COND=(10,2,PD,EQ,1)                       
  OUTREC FIELDS=(1,9)                               
/*
Back to top
View user's profile Send private message Yahoo Messenger
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Thu Dec 26, 2002 7:13 am    Post subject: Reply with quote

Himesh,

You can achieve the results in 1 single step instead of 2. You also don't need the sysprint statement in your jcl as well as the DCB parameter.

Code:

//STEP010  EXEC PGM=SORT                             
//*
//SYSOUT   DD  SYSOUT=*                             
//SORTIN   DD  DSN=YOUR INPUT FILE,
//             DISP=SHR       
//SORTOUT  DD  DSN=YOUR OUTPUT FILE,
//             DISP=(NEW,CATLG,DELETE),
//             UNIT=SYSDA,         
//             SPACE=(CYL,(X,Y),RLSE)
//SYSIN    DD  *                                     
  INREC FIELDS=(1,9,X'001C')         $ TOTAL LRECL OF 9 + CONSTANT 1 
  SORT FIELDS=(1,9,CH,A)             $ SORT ON LRECL         
  SUM  FIELDS=(10,2,PD)              $ SUM ON CONSTANT
  OUTFIL INCLUDE=(10,2,PD,EQ,1),     $ INCLUDE ONLY WHEN THE SUM IS 1
  OUTREC=(1,9)                       $ REMOVE THE CONSTANT AT THE END         
/* 


Hope this helps...

cheers

kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Himesh
CICS Forum Moderator
CICS Forum Moderator


Joined: 20 Dec 2002
Posts: 80
Topics: 21
Location: Chicago

PostPosted: Thu Dec 26, 2002 8:04 am    Post subject: Reply with quote

Yes kolusu,

The 2 steps was an oversight. Embarassed

By default i used to code SYSPRINT for all the utility PGMs!
instead of keeping track of, when to use them and when not.... Very Happy

But the DCB was deliberately used. I felt it is useful, when manipulating files (like varying the LRECL etc...). So better off remembering the same and using it everywhere.

regds,
Himesh
Back to top
View user's profile Send private message Yahoo Messenger
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Thu Dec 26, 2002 8:14 am    Post subject: Reply with quote

Himesh,

Sort products have a parm SDB(system-determined blocksize).The default is SDB=ON which means that the system will determine the best blocksize for new or previously allocated but unopened DASD output data sets except for

  • A BLKSIZE found in the JCL DCB specification
  • A BLKSIZE derived from an available tape label
  • A VSAM data set



So I would recommend that you DO Not code the DCB parm.Also the Lrecl of the output dataset is calcualted by the outrec fields. Unless you want to override it, you dont have to code the lrecl.

Hope this helps...

cheers

kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Himesh
CICS Forum Moderator
CICS Forum Moderator


Joined: 20 Dec 2002
Posts: 80
Topics: 21
Location: Chicago

PostPosted: Thu Dec 26, 2002 8:33 am    Post subject: Reply with quote

Great,

thx for the info.

another small Q.

What if a prog was used to read the same file (manipulated in the JCL)?

I thought it would be safest to do evrything by the book (so to say).

Himesh
Back to top
View user's profile Send private message Yahoo Messenger
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Thu Dec 26, 2002 9:35 am    Post subject: Reply with quote

Himesh,

you will have no problems whatsoever to read the file manipulated by jcl as long as the program has this line coded when defining the file
Code:

Block contains 0 records


Block contains 0 records indicates to the system that information about blocksize should be taken either from JCL or Label. This is done to avoid recompilation of the pgm any time the data set blocksize is changed.

Hope this helps...

cheers

kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Thu Dec 26, 2002 11:09 am    Post subject: Reply with quote

Kolusu is right that DFSORT will determine the output RECFM, LRECL and BLKSIZE automatically. So it's best not to specify them unless you want to set them differently then DFSORT would.

Note that if you specify the RECFM and/or LRECL, but not the BLKSIZE, DFSORT will still set the system determined BLKSIZE. However, if you specify the BLKSIZE, DFSORT will use that BLKSIZE instead of the system determined BLKSIZE.

For complete information on DFSORT's SDB options, use the following link and then do a find on "SDB=":

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICECI107/2.2.6?DT=20020718103509
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Himesh
CICS Forum Moderator
CICS Forum Moderator


Joined: 20 Dec 2002
Posts: 80
Topics: 21
Location: Chicago

PostPosted: Fri Dec 27, 2002 12:04 am    Post subject: Reply with quote

Tht was useful info..

Thx Frank and Kolusu...

Himesh
Back to top
View user's profile Send private message Yahoo Messenger
manojagrawal
Beginner


Joined: 25 Feb 2003
Posts: 124
Topics: 29

PostPosted: Mon Mar 03, 2003 6:20 am    Post subject: Reply with quote

What if we want this?

Code:
Input file:

111111111
111111111
222222222
333333333
333333333
444444444
444444444
444444444
444444444
555555555

Output file:

111111111
222222222
333333333
444444444
555555555


Regards,
Manoj.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Mon Mar 03, 2003 7:08 am    Post subject: Reply with quote

Manojagarwal,
The following DFSORT/ICETOOL jcl will give you the desired results. If you have syncsort at your shop change the pgm name to SYNCTOOL.

Code:

//STEP100 EXEC PGM=ICETOOL               
//TOOLMSG DD SYSOUT=*                     
//DFSMSG  DD SYSOUT=*                     
//IN      DD *
111111111
111111111
222222222
333333333
333333333
444444444
444444444
444444444
444444444
555555555
//OUT     DD DSN=YOUR OUTPUT DSN,
//           DISP=(NEW,CATLG,DELETE),
//           UNIT=SYSDA,
//           SPACE=(CYL,(X,Y),RLSE)
//TOOLIN  DD *                             
  SELECT FROM(IN) TO(OUT) ON(1,9,CH) FIRST
/*   


Hope this helps...

cheers

kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
manojagrawal
Beginner


Joined: 25 Feb 2003
Posts: 124
Topics: 29

PostPosted: Mon Mar 03, 2003 7:49 am    Post subject: Reply with quote

Hi,

Lets take into consideration something else but similar.

Code:
File:

Field1 - 5
Field2 - 3 
Field3 - 4

Field sizes are random. Thus we have something like
                     AAAA1 BB1 CCC1    (Spaces not there, given for clarity)
                     AAAA1 BB1 CCC1
                     AAAA1 BB2 CCC2
                     AAAA2 BB1 CCC2
                     AAAA2 BB1 CCC3


This file can have a lot of duplicates. The output file required is as below:
Code:
                     
                     AAAA1 BB1 CCC1    (Spaces not there, given for clarity)
                     AAAA1 BB2 CCC2



What is happening is
1) All duplicates records are sent to output once. (Use option FIRST)
2) All single records are also sent to output.
3) All records with FIELD1 and FIELD2 matching but FIELD3 differing are eliminated altogether.

Can this be done through a JCL, and if so how? What I have thought is the first step would be to do the elimination using FIRST as previously mentioned. What about the next step. How would that have to be done? Can this be done in a single step?

Another question, do we have to sort the input file before running these JCL's or can the input file records be in any order.

Regards,
Manoj.
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Mon Mar 03, 2003 11:12 am    Post subject: Reply with quote

Monaj,

I'm not sure I fully understand what you want to do, but as for 1) and 2), FIRST will do both of them. FIRST keeps the first record with each value, so it keeps all "single records" (by which I think you mean records with unique keys) and the first record of each set of records with duplicate keys.

Note that the FIRSTDUP option of DFSORT's ICETOOL will do 1) without doing 2), that is, it keeps the first record of each set of records with duplicate keys, but does not keep unique records.

To answer your second question, SELECT does a sort using the ON fields as the key, so the input records can be in any order.

For more information on DFSORT's ICETOOL, see:

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICECA109/6.0?DT=20020722140254

For information on the newest additions to DFSORT's ICETOOL, see:

http://www.storage.ibm.com/software/sort/mvs/uq90053/index.html
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
manojagrawal
Beginner


Joined: 25 Feb 2003
Posts: 124
Topics: 29

PostPosted: Tue Mar 04, 2003 1:53 am    Post subject: Reply with quote

Hi Frank,

I think I got you a little confused on what I was trying to say.

Lets separate the 2 parts. For 1 and 2, the FIRST option works perfectly. Say we use the FIRST option on the input file and the output we now have is as follows (which we would use for input as the next step).

Code:
AAAA1 BB1 CCC1
   AAAA2 BB2 CCC2
   AAAA2 BB2 CC22
   AAAA3 BB3 CCC3
   AAAA3 BB3 CC33
   AAAA4 BB4 CCC4
   AAAA5 BB5 CCC5


Thus, now we have all unique records. The next output would be as follows.

Code:
   AAAA1 BB1 CCC1
   AAAA4 BB4 CCC4
   AAAA5 BB5 CCC5


Now, as I typed this out, I realised that if we do a select with NODUPS on the first 8 as the key, we would get the desired result, which i did get. Also that it could be done in 1 step. So, the prob is solved.

Frank, Kolusu, Once again. Thanks!!!!!!

Regards,
Manoj.
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Thu Mar 25, 2004 1:13 am    Post subject: Reply with quote

Can the following be done via SORT. (comparison of two files)
Code:

File: 1

AAAAA     111222333
BBBBB     444
CCCCC     111

File: 2

AAAAA     111333
BBBBB     555
CCCCC     222

Output:

AAAAA     222
BBBBB     444
BBBBB     555
CCCCC     111
CCCCC     222


There are two fields here, the first one of 5 bytes (AAAAA, BBBBB ...) and the 2nd field of 3 bytes which can repeat itself 10 times (max). (111222333....) i.e PIC X(3) OCCURS 10 TIMES DEPENDING ON WS-COUNT

The files should be compared on the entire record length. In case of first comparison b/w "AAAAA 111222333" and "AAAAA 111333", Since the field 2 with value 222 is not common between these two records the output should contain "AAAAA 222".

Thanks,
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group