MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Compare 2 files and only show different records

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Fri Sep 05, 2014 6:02 am    Post subject: Compare 2 files and only show different records Reply with quote

My problem is the following. I have 2 files with the same number of records (25,000,000 each). The files aren't sorted, but do a have a 1-to-1 relationship vis-a-vis the records. I would like to be able to run ICETOOL or DFSORT and produce one output file showing each differing record from each file. For example, assuming the only records that differ are 10 and 20, I would like the output file to contain 4 records as in
Quote:

File 1 record 10
File 2 record 10
File 1 record 20
File 2 record 20

I found an example Kolusus had given someone else (shown below), but this only seemed to show the records from file 2 that differed from those in file 1.
Code:

//STEP0100 EXEC PGM=ICETOOL                                         
//TOOLMSG  DD SYSOUT=*                                               
//DFSMSG   DD SYSOUT=*                                               
//IN       DD DSN=MISI01.MYRA.Q5920000.DOLLARS,DISP=SHR       <-- thie file simply contains $$$
//         DD DSN=MISI01.MYRA.Q592001.OLD,DISP=SHR                   
//         DD DSN=MISI01.MYRA.Q5920000.DOLLARS,DISP=SHR             
//         DD DSN=MISI01.MYRA.Q592001.NEW,DISP=SHR                   
//*                                                                 
//OUT      DD DSN=MISI01.MYRA.Q5920000.DIFF,DISP=(,CATLG),           
//         RECFM=FB,LRECL=365,DATACLAS=DCLARGE                       
//*                                                                 
//TOOLIN   DD *                                                     
  SELECT FROM(IN) TO(OUT) ON(1,365,CH) -                             
  FIRST USING(CTL1)                                                 
//*                                                                 
//CTL1CNTL DD *                                                     
  INREC IFTHEN=(WHEN=GROUP,BEGIN=(1,3,CH,EQ,C'$$$'),PUSH=(366:ID=1))
  OUTFIL FNAMES=OUT,BUILD=(1,365),INCLUDE=(366,1,ZD,EQ,2)           


Any suggestions/improvements gratefully appreciated.
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Fri Sep 05, 2014 9:15 am    Post subject: Reply with quote

You should be able to do what you want with a JOINKEYS (UNSORTED,NOSEQCK), and setting sequence numbers in both JNFnCNTL files, then use the sequence numbers as the keys.

You can use the matched marker (?) in the REFORMAT statement, and a JOIN of UNPAIRED,ONLY.

For the SELECT with the entire record as the key, you'd face some issues to get what you want, one being that 50,000,000 records would be sorted, and another being that "out of sequence" records would magically appear to be sequenced.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12376
Topics: 75
Location: San Jose

PostPosted: Fri Sep 05, 2014 10:59 am    Post subject: Re: Compare 2 files and only show different records Reply with quote

misi01,

Based on the questions you have asked recently, I suggest that you book mark the following links.

1. Smart DFSORT Tricks

2. DFSORT Publications of Application Programming Guide, Messages, Codes and Diagnosis Guide and Getting Started

misi01 wrote:

I found an example Kolusus had given someone else (shown below), but this only seemed to show the records from file 2 that differed from those in file 1.
Any suggestions/improvements gratefully appreciated.


Well the solution is only picking first record per key. If there are duplicates it picks the first key from the duplicated records.

As William pointed out it is a simple task for Joinkeys . I am showing the complete example here as you need to limit the INCLUDE COND to chunks of 256 bytes. So I had to break the comparison of 365 byte into a chunk of 256 and 109 bytes.

Code:

//STEP0100 EXEC PGM=SORT                           
//SYSOUT   DD SYSOUT=*                             
//INA      DD *                                     
MICHAEL                                             
WILLIAM                                             
BOB                                                 
TONY                                               
//INB      DD *                                     
MICHAEL                                             
WILLIAM - MISMATCH                                 
BOB                                                 
TONY - MISMATCH                                     
//SORTOUT  DD SYSOUT=*                             
//SYSIN    DD *                                     
  OPTION COPY                                       
  JOINKEYS F1=INA,FIELDS=(366,8,A),SORTED,NOSEQCK   
  JOINKEYS F2=INB,FIELDS=(366,8,A),SORTED,NOSEQCK   
  JOIN UNPAIRED                                     
  REFORMAT FIELDS=(F1:1,365,F2:1,365)               
  OMIT COND=(001,256,CH,EQ,366,256,CH,AND,         
             257,109,CH,EQ,622,109,CH)             
  OUTFIL BUILD=(1,365,/,366,365)                   
//*                                                 
//JNF1CNTL DD *                                     
  INREC OVERLAY=(366:SEQNUM,8,ZD)                   
//*                                                 
//JNF2CNTL DD *                                     
  INREC OVERLAY=(366:SEQNUM,8,ZD)                   
//*

_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12376
Topics: 75
Location: San Jose

PostPosted: Fri Sep 05, 2014 11:01 am    Post subject: Reply with quote

William Collins wrote:
For the SELECT with the entire record as the key, you'd face some issues to get what you want, one being that 50,000,000 records would be sorted, and another being that "out of sequence" records would magically appear to be sequenced.


You can override the SORT on SELECT statement with a COPY in XXXXCNTL but in OP case he needs to sort to be able to compare the records from 2 files.
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Fri Sep 05, 2014 12:05 pm    Post subject: Reply with quote

Yes, and if you SORT you have the chance of a record that is out of order appearing to be otherwise. With a one-to-one relationship on the files, can't SORT them, and if not SORTed on the SELECT the matches will be far distant from each other.
Back to top
View user's profile Send private message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Mon Sep 08, 2014 2:27 am    Post subject: Reply with quote

Thanks Kolusu. Your solution worked perfectly. Originally, I thought that all I now have to remember is that any odd numbered records are from the original file, and even numbered ones are from the new file.

Then I thought that it would be even easier to simply append the characters OLD/NEW to the input files before running your code.
Said and done (and adjusting the numbers/positions in your example with +3) the results looked brilliant.

Thanks again.

(I've probably said it before, but I'm beginning to realize that you can do almost anything with DFSORT/ICETOOL as long as your imagination is good enough.
My problem is that I don't write this sort of JCL often enough to become proficient at it)
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Mon Sep 08, 2014 3:48 am    Post subject: Reply with quote

There is a match-marker, which you specify by coding ? in the REFORMAT statement.

The match-marker has three values: B (records from both files present, a match on key); 1 (record from file 1 only, no match); 2 (record from file 2 only, no match).
Code:

REFORMAT FIELDS=(F1:1,1,?,F2:1,1)

INREC IFTHEN=(WHEN=(2,1,CH,EQ,C'B'),
                 BUILD=(C'F1',X,1,1,X,C'F2',X,3,1)),
      IFTHEN=(WHEN=(2,1,CH,EQ,C'1'),
                 BUILD=(C'F1',X,1,1,X,2X,X,X)),
      IFTHEN=(WHEN=(NONE,
                 BUILD=(2X,X,X,X,C'F2',X,3,1))


Subject to typos, that should show the fiirst byte of each record which is present in the current record, created by the REFORMAT, in your Main Task (the processing after the JOINKEYS).

I'v used X to insert a blank for positioning, and blanks where there is no data from a particular file. I've kept them individual to show where they match in the BUILD, but they could be written as 5X.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12376
Topics: 75
Location: San Jose

PostPosted: Mon Sep 08, 2014 10:57 am    Post subject: Reply with quote

misi01 wrote:
Then I thought that it would be even easier to simply append the characters OLD/NEW to the input files before running your code.


misi01,

As William pointed out, in regular match scenario based on keys you can use the matchmaker indicator which gives you much more flexibility of identifying the source of the output record.

I am not sure as to why you need to run another copy operation just to append OLD/NEW as you can do it quite easily in the same pass of joinkeys itself. In your case you have 1-to-1 relationship between the 2 files. So you will always have a match on both files. So you just need to check the contents which is done using the OMIT condition.

Assuming that your NEW file is assigned to DD INA and OLD file assigned to INB, all you need is just change the OUTFIL statement

Code:
OUTFIL BUILD=(1,365,/,366,365)     


to

Code:
OUTFIL BUILD=(001,365,C'NEW',/,
              366,365,C'OLD')     



misi01 wrote:

(I've probably said it before, but I'm beginning to realize that you can do almost anything with DFSORT/ICETOOL as long as your imagination is good enough.


Thanks.
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group