Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Fri Sep 05, 2014 6:02 am Post subject: Compare 2 files and only show different records
My problem is the following. I have 2 files with the same number of records (25,000,000 each). The files aren't sorted, but do a have a 1-to-1 relationship vis-a-vis the records. I would like to be able to run ICETOOL or DFSORT and produce one output file showing each differing record from each file. For example, assuming the only records that differ are 10 and 20, I would like the output file to contain 4 records as in
Quote:
File 1 record 10
File 2 record 10
File 1 record 20
File 2 record 20
I found an example Kolusus had given someone else (shown below), but this only seemed to show the records from file 2 that differed from those in file 1.
You should be able to do what you want with a JOINKEYS (UNSORTED,NOSEQCK), and setting sequence numbers in both JNFnCNTL files, then use the sequence numbers as the keys.
You can use the matched marker (?) in the REFORMAT statement, and a JOIN of UNPAIRED,ONLY.
For the SELECT with the entire record as the key, you'd face some issues to get what you want, one being that 50,000,000 records would be sorted, and another being that "out of sequence" records would magically appear to be sequenced.
I found an example Kolusus had given someone else (shown below), but this only seemed to show the records from file 2 that differed from those in file 1.
Any suggestions/improvements gratefully appreciated.
Well the solution is only picking first record per key. If there are duplicates it picks the first key from the duplicated records.
As William pointed out it is a simple task for Joinkeys . I am showing the complete example here as you need to limit the INCLUDE COND to chunks of 256 bytes. So I had to break the comparison of 365 byte into a chunk of 256 and 109 bytes.
Code:
//STEP0100 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//INA DD *
MICHAEL
WILLIAM
BOB
TONY
//INB DD *
MICHAEL
WILLIAM - MISMATCH
BOB
TONY - MISMATCH
//SORTOUT DD SYSOUT=*
//SYSIN DD *
OPTION COPY
JOINKEYS F1=INA,FIELDS=(366,8,A),SORTED,NOSEQCK
JOINKEYS F2=INB,FIELDS=(366,8,A),SORTED,NOSEQCK
JOIN UNPAIRED
REFORMAT FIELDS=(F1:1,365,F2:1,365)
OMIT COND=(001,256,CH,EQ,366,256,CH,AND,
257,109,CH,EQ,622,109,CH)
OUTFIL BUILD=(1,365,/,366,365)
//*
//JNF1CNTL DD *
INREC OVERLAY=(366:SEQNUM,8,ZD)
//*
//JNF2CNTL DD *
INREC OVERLAY=(366:SEQNUM,8,ZD)
//*
Joined: 26 Nov 2002 Posts: 12376 Topics: 75 Location: San Jose
Posted: Fri Sep 05, 2014 11:01 am Post subject:
William Collins wrote:
For the SELECT with the entire record as the key, you'd face some issues to get what you want, one being that 50,000,000 records would be sorted, and another being that "out of sequence" records would magically appear to be sequenced.
You can override the SORT on SELECT statement with a COPY in XXXXCNTL but in OP case he needs to sort to be able to compare the records from 2 files. _________________ Kolusu
www.linkedin.com/in/kolusu
Yes, and if you SORT you have the chance of a record that is out of order appearing to be otherwise. With a one-to-one relationship on the files, can't SORT them, and if not SORTed on the SELECT the matches will be far distant from each other.
Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Mon Sep 08, 2014 2:27 am Post subject:
Thanks Kolusu. Your solution worked perfectly. Originally, I thought that all I now have to remember is that any odd numbered records are from the original file, and even numbered ones are from the new file.
Then I thought that it would be even easier to simply append the characters OLD/NEW to the input files before running your code.
Said and done (and adjusting the numbers/positions in your example with +3) the results looked brilliant.
Thanks again.
(I've probably said it before, but I'm beginning to realize that you can do almost anything with DFSORT/ICETOOL as long as your imagination is good enough.
My problem is that I don't write this sort of JCL often enough to become proficient at it) _________________ Michael
There is a match-marker, which you specify by coding ? in the REFORMAT statement.
The match-marker has three values: B (records from both files present, a match on key); 1 (record from file 1 only, no match); 2 (record from file 2 only, no match).
Subject to typos, that should show the fiirst byte of each record which is present in the current record, created by the REFORMAT, in your Main Task (the processing after the JOINKEYS).
I'v used X to insert a blank for positioning, and blanks where there is no data from a particular file. I've kept them individual to show where they match in the BUILD, but they could be written as 5X.
Joined: 26 Nov 2002 Posts: 12376 Topics: 75 Location: San Jose
Posted: Mon Sep 08, 2014 10:57 am Post subject:
misi01 wrote:
Then I thought that it would be even easier to simply append the characters OLD/NEW to the input files before running your code.
misi01,
As William pointed out, in regular match scenario based on keys you can use the matchmaker indicator which gives you much more flexibility of identifying the source of the output record.
I am not sure as to why you need to run another copy operation just to append OLD/NEW as you can do it quite easily in the same pass of joinkeys itself. In your case you have 1-to-1 relationship between the 2 files. So you will always have a match on both files. So you just need to check the contents which is done using the OMIT condition.
Assuming that your NEW file is assigned to DD INA and OLD file assigned to INB, all you need is just change the OUTFIL statement
Code:
OUTFIL BUILD=(1,365,/,366,365)
to
Code:
OUTFIL BUILD=(001,365,C'NEW',/,
366,365,C'OLD')
misi01 wrote:
(I've probably said it before, but I'm beginning to realize that you can do almost anything with DFSORT/ICETOOL as long as your imagination is good enough.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum