MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Eliminate All Duplicates in a file
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12404
Topics: 75
Location: San Jose

PostPosted: Thu Mar 25, 2004 6:18 am    Post subject: Reply with quote

Phantom,

I can't think of way to get only the differences(non match data) in the output record. If you have eztrieve at your shop then check the following topic which discusses similar requirement.

http://www.mvsforums.com/helpboards/viewtopic.php?p=6682

Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Thu Mar 25, 2004 8:21 am    Post subject: Reply with quote

Thanks kolusu,

I just wanted to know whether this could be accomplished via sort. We already have a cobol program to do this, but felt sort would be simpler than that as far as the no. of lines of code is concerned.

Thanks for your help.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12404
Topics: 75
Location: San Jose

PostPosted: Thu Mar 25, 2004 1:08 pm    Post subject: Reply with quote

Phantom,

OK I guess my mind did not start working when I posted that I can't think of a solution. Sad . I attribute it lack of sleep watching cricket matches(Day/night matches start at 4 Am in the morning and day matches start at 11:30 PM in the night here). But the matches are worth watching. Enjoyed every match.

Any way the following DFSORT/ICETOOL JCL will give you desired results. I assumed both your files are 80 bytes in length and is of FB recfm. If you have syncsort at your shop then change the pgm name to synctool.

A brief explanation of the job. As you said your file has 2 fields, we take file1 and split each record into 10 records based on your second field(as your second fields occurs 10 times)

The second copy operator does the same for the second file.

Now concatenate these 2 files together and using select operator eliminate all the dups.so you will be left with only keys when the records does not match.

Code:

//STEP0100  EXEC  PGM=ICETOOL                                           
//TOOLMSG   DD SYSOUT=*                                                 
//DFSMSG    DD SYSOUT=*                                                 
//IN1       DD *                                                       
----+----1----+----2----+----3----+----4----+----5----+----6----+----7--
AAAAA     111222333                                                     
BBBBB     444                                                           
CCCCC     111                                                           
//IN2       DD *                                                       
AAAAA     111333                                                       
BBBBB     555                                                           
CCCCC     222                                                           
//T1        DD DSN=&T1,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)             
//T2        DD DSN=&T2,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)             
//CON       DD DSN=&T1,DISP=OLD,VOL=REF=*.T1                           
//          DD DSN=&T2,DISP=OLD,VOL=REF=*.T2                           
//OUT       DD DSN=YOUR OUTPUT FILE,
//             DISP=(NEW,CATLG,DELETE),
//             UNIT=SYSDA,
//             SPACE=(CYL,(X,Y),RLSE)
//TOOLIN    DD *                                             
  COPY FROM(IN1) TO(T1) USING(CTL1)                           
  COPY FROM(IN2) TO(T2) USING(CTL1)                           
  SELECT FROM(CON) TO(OUT) ON(1,5,CH) ON(11,3,CH) NODUPS       
//CTL1CNTL  DD *                                             
   OUTFIL OUTREC=(1,10,11,3,/,                                 
                  1,10,14,3,/,                                 
                  1,10,17,3,/,                                 
                  1,10,20,3,/,                                 
                  1,10,23,3,/,                                 
                  1,10,26,3,/,                                 
                  1,10,29,3,/,                                 
                  1,10,32,3,/,                                 
                  1,10,35,3,/,                                 
                  1,10,38,3)
/*                             


The output from this job is :
Code:

AAAAA     222 
BBBBB     444 
BBBBB     555 
CCCCC     111 
CCCCC     222 


Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Fri Mar 26, 2004 7:17 am    Post subject: Reply with quote

Kolusu,

This is just awesome...Thanks a lot

I have one more question... Embarassed Question

Code:

File: 1

AAAAA     111222333
BBBBB     444
CCCCC     111

File: 2

AAAAA     111333
BBBBB     555
CCCCC     222

Output:

AAAAA     222
BBBBB     444
CCCCC     111

When there is a mismatch between file A and file B (same comparison), is there any way by which I could stop the records from B from being written in the output...Please see the example above.

When there is a mismatch, the first way would write the following records also in the output. (these are available only FILE 2 and not in FILE 1).
Code:

BBBBB     555
CCCCC     111
CCCCC     222

Can this be stopped. The output should contain only records from File 1 and that does not match with the contents of File 2 (Please use the same comparison logic used above).

This is just to gain more knowledge.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12404
Topics: 75
Location: San Jose

PostPosted: Fri Mar 26, 2004 8:29 am    Post subject: Reply with quote

Phantom,

Sure you can write only records from file1 in case of mismatch. The following DFSORT/ICETOOL JCl will give you the desired results.

A brief explanation of the job. We add unique identfier to the file while splitting. For file 1 we add a constant of '0001' and for file2 we add a constant of '0002' at the end of each record.

Now we concatenate the 2 temp files and sort on the keys while summing on the constants at the end. By doing so any duplicates will have count greater or equal to 3, so using include condition on the outfil we select only records where the constant remains '0001'.

Code:

//STEP0100  EXEC  PGM=ICETOOL                                           
//TOOLMSG   DD SYSOUT=*                                                 
//DFSMSG    DD SYSOUT=*                                                 
//IN1       DD *                                                       
AAAAA     111222333                                                     
BBBBB     444                                                           
CCCCC     111                                                           
//IN2       DD *                                                       
AAAAA     111333                                                       
BBBBB     555                                                           
CCCCC     222                                                           
//T1        DD DSN=&T1,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)             
//T2        DD DSN=&T2,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)             
//CON       DD DSN=&T1,DISP=OLD,VOL=REF=*.T1                           
//          DD DSN=&T2,DISP=OLD,VOL=REF=*.T2                           
//OUT       DD DSN=YOUR OUTPUT FILE,
//             DISP=(NEW,CATLG,DELETE),
//             UNIT=SYSDA,
//             SPACE=(CYL,(X,Y),RLSE)
//TOOLIN    DD *                                         
  COPY FROM(IN1) USING(CTL1)                               
  COPY FROM(IN2) USING(CTL2)                               
  SORT FROM(CON) USING(CTL3)                               
//CTL1CNTL  DD *                                         
  OUTFIL FNAMES=T1,OUTREC=(1,10,11,3,C'0001',/,           
                            1,10,14,3,C'0001',/,           
                            1,10,17,3,C'0001',/,           
                            1,10,20,3,C'0001',/,           
                            1,10,23,3,C'0001',/,           
                            1,10,26,3,C'0001',/,           
                            1,10,29,3,C'0001',/,           
                            1,10,32,3,C'0001',/,           
                            1,10,35,3,C'0001',/,           
                            1,10,38,3,C'0001')             
//CTL2CNTL  DD *                                         
  OUTFIL FNAMES=T2,OUTREC=(1,10,11,3,C'0002',/,           
                            1,10,14,3,C'0002',/,           
                            1,10,17,3,C'0002',/,           
                            1,10,20,3,C'0002',/,           
                            1,10,23,3,C'0002',/,           
                            1,10,26,3,C'0002',/,           
                            1,10,29,3,C'0002',/,           
                            1,10,32,3,C'0002',/,           
                            1,10,35,3,C'0002',/,           
                            1,10,38,3,C'0002')             
//CTL3CNTL  DD *                                         
  OPTION EQUALS                                           
  SORT FIELDS=(1,5,CH,A,11,3,CH,A)                         
  SUM FIELDS=(14,4,ZD)                                     
  OUTFIL FNAMES=OUT,INCLUDE=(14,4,ZD,EQ,1),OUTREC=(1,13)   
/*



The output from this job is :

Code:

AAAAA     222
BBBBB     444
CCCCC     111


Hope this helps....

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Fri Mar 26, 2004 8:50 am    Post subject: Reply with quote

Wow,

its just great...Thanks a lot kosulu,
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Mar 31, 2004 2:45 am    Post subject: Reply with quote

I am getting an error if I use FIRSTDUP option in SYNCSORT. I would like to know whether SYNCSORT (SYNCTOOL) supports this option. If not, is there any other way to achieve the functionality of FIRSTDUP

Code:

SYT000I  SYNCTOOL RELEASE 1.3A - COPYRIGHT 1999  SYNCSORT INC.                 
SYT001I  INITIAL PROCESSING MODE IS "STOP"                                     
SYT002I  "TOOLIN" INTERFACE BEING USED                                         
                                                                               
           SELECT FROM(IN) TO(OUT) ON(1,9,CH) FIRSTDUP                         
SYT050E  INVALID OPERAND ON "SELECT" STATEMENT                                 
SYT030I  OPERATION COMPLETED WITH RETURN CODE 12                               
                                                                               
SYT015I  PROCESSING MODE CHANGED FROM "STOP" TO "SCAN" DUE TO OPERATION FAILURE
                                                                               
SYT004I  SYNCTOOL PROCESSING COMPLETED WITH RETURN CODE 12                     


Thanks,
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Mar 31, 2004 3:16 am    Post subject: Reply with quote

Hi,

I wrote a SYNCTOOL code as shown below to achieve the functionality of FIRSTDUP option.

Please let me know if this is okay. Can the code be enhanced a bit? Can I avoid using the temporary file ?

Code:

//R010     EXEC PGM=SYNCTOOL                                   
//TOOLMSG   DD SYSOUT=*                                       
//DFSMSG    DD SYSOUT=*                                       
//IN       DD  DSN=MY.INPUT.FILE1,
//             DISP=SHR                                       
//         DD  DSN=MY.INPUT.FILE2,
//             DISP=SHR                                       
//T1       DD  DSN=&T1,DISP=(,PASS)                           
//OUT      DD  DSN=MY.OUTPUT.FILE,
//             DISP=(NEW,CATLG,DELETE),                       
//             UNIT=SYSDA,                                     
//             AVGREC=K,                                       
//             SPACE=(TRK,(1,1),RLSE),                         
//             DCB=(RECFM=FB,LRECL=80)                         
//TOOLIN   DD  *                                               
  COPY FROM(IN) TO(T1) USING(CTL1)                             
  SORT FROM(T1) USING(CTL2)                                   
//CTL1CNTL DD  *                                               
  OUTREC FIELDS=(1,80,C'0001')                                 
//CTL2CNTL DD  *                                           
  SORT FIELDS=(1,9,CH,A)                                   
  SUM FIELDS=(81,4,ZD)                                     
  OUTFIL FNAMES=OUT,INCLUDE=(81,4,ZD,GT,1),OUTREC=(1,80)   
/*                                                         
//SYSOUT   DD  SYSOUT=*                                   
//*                                                       


Thanks,
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12404
Topics: 75
Location: San Jose

PostPosted: Wed Mar 31, 2004 9:06 am    Post subject: Reply with quote

Phantom,

The version of synctool you have does not support the option FIRSTDUP. And the job you showed is just eliminating the duplicates, but will not fetch you the firstdup.

The following JCL will give you the desired results.
Code:

//STEP0100 EXEC PGM=SYNCTOOL                               
//TOOLMSG   DD SYSOUT=*                                   
//DFSMSG    DD SYSOUT=*                                   
//IN        DD DSN=YOUR INPUT FILE,
//             DISP=SHR                                         
//T1        DD DUMMY                                     
//CTL1XSUM  DD DSN=&T1,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)
//OUT       DD SYSOUT=*                                   
//TOOLIN    DD *                                         
   SORT FROM(IN) TO(T1) USING(CTL1)                       
   SORT FROM(CTL1XSUM) TO(OUT) USING(CTL2)                 
//CTL1CNTL  DD *                                         
   SORT FIELDS=(1,3,CH,A)                                 
   SUM FIELDS=NONE,XSUM                                   
//CTL2CNTL  DD *                                         
   SORT FIELDS=(1,3,CH,A)                                 
   SUM FIELDS=NONE
/*


Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12404
Topics: 75
Location: San Jose

PostPosted: Wed Mar 31, 2004 9:10 am    Post subject: Reply with quote

vkphani,

Whenever you say a particular feature is not working then you need to post the error messages(toolmsg & dfsmsg) from the sysout. If you happen to have toolmsg in the following manner then your shop has syncsort and you have a older version of synctool which does not support many features that icetool supports
Code:

SYT000I  SYNCTOOL RELEASE 1.3A - COPYRIGHT 1999  SYNCSORT INC.                 
SYT001I  INITIAL PROCESSING MODE IS "STOP"                                     
SYT002I  "TOOLIN" INTERFACE BEING USED                                         
                                                                               
           SELECT FROM(IN) TO(OUT) ON(1,9,CH) FIRSTDUP                         
SYT050E  INVALID OPERAND ON "SELECT" STATEMENT                                 
SYT030I  OPERATION COMPLETED WITH RETURN CODE 12                               
                                                                               
SYT015I  PROCESSING MODE CHANGED FROM "STOP" TO "SCAN" DUE TO OPERATION FAILURE
                                                                               
SYT004I  SYNCTOOL PROCESSING COMPLETED WITH RETURN CODE 12                     


Check this topic for an alternate solution(last post by me) for getting the firstdup in here

http://www.mvsforums.com/helpboards/viewtopic.php?p=9587

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Mar 31, 2004 10:14 am    Post subject: Reply with quote

Kolusu,

Thanks for your help. I was thinking that whenever we use SUM FIELDS=(xx,yy,ff), the first occurance of the record with a total of the field that is SUMMED up. Is this correct ?

If so, then the statement INCLUDE=(14,4,ZD,EQ,1) will give me the records in which the summed up field contains a value greater than 1 meaning there was a duplicate in the input file and the records have been summed up. This way I have eliminated all the unique records and I produce only one occurence (which I think is the first occurance) of a duplicate record.

Please correct if I am wrong...Thanks a lot for your help
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12404
Topics: 75
Location: San Jose

PostPosted: Wed Mar 31, 2004 10:32 am    Post subject: Reply with quote

Phantom,

You aim is to get the first duplicate but not the first record among the set of duplicates. Let me explain it with an example.

Code:

ABC RECORD 1
ABC RECORD 2
ABC RECORD 3


with our code you will always get
Code:

ABC RECORD 1


Using firstdup parm on select statement will fetch the following record

Code:

ABC RECORD 2


If you have happy to the first record among the set of duplicates, then the following JCl will give you the desired results without a temp file.

Code:

//STEP0100 EXEC PGM=SORT                               
//SYSOUT   DD SYSOUT=*                                   
//SORTIN   DD DSN=YOUR INPUT FILE,
//            DISP=SHR                                         
//SORTOUT  DD SYSOUT=*                                   
//SYSIN    DD *                                         
   INREC FIELDS=(1,80,C'0001')
   SORT FIELDS=(1,3,CH,A)                                 
   SUM FIELDS=(81,4,ZD)
   OUTFIL INCLUDE=(81,4,ZD,GT,1),OUTREC=(1,80)
/*


Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Mar 31, 2004 10:43 am    Post subject: Reply with quote

Thanks a lot kolusu,

Actually I first thought of the same solution. But I got a little confused when I read the following message regarding INREC control statement, which states that INREC will be processed after INCLUDE/OMIT control processing.

Code:

--------------- V=SYNCSORT P=SYNCSORT FOR Z/OS R=V1R1 I=INREC ----------------
******************    Text Below Copyright (c) 2004, SYNCSORT    *******************
INREC Control Statement                                                       
                                                                               
The INREC control statement reformats the input records. Use the INREC         
control statement to add, delete, or reformat fields before the records       
are sorted or merged. Use the OUTREC control statement or the OUTREC           
parameter of the OUTFIL control statement to delete or reformat fields after   
the records are sorted or merged. Note that INREC is performed after E15       
exit processing and INCLUDE/OMIT control statement processing.                 
                                                                               


Could you please clarify.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12404
Topics: 75
Location: San Jose

PostPosted: Wed Mar 31, 2004 11:59 am    Post subject: Reply with quote

Phantom,

There are 2 ways of of using include/omit conditions

If code INCLUDE COND then it is a stand alone include cond which will be processed before the inrec statement.

You can also use INCLUDE on OUTFIL statement, which is evaluated last while writting out records.

Hope this helps...

Cheers

KOlusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Thu Apr 01, 2004 12:35 am    Post subject: Reply with quote

Thanks for enlightening me kolusu....I never knew this..

Thanks a lot
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group