Joined: 24 Dec 2002 Posts: 32 Topics: 6 Location: U.K
Posted: Mon May 09, 2005 12:21 pm Post subject: Merging two Files - Removing Duplicates
Hi Guys,
I have a requirement like this.
I have two Input files which have same kind of records (as u can see below), I want to merge these two files into one. There are some records which will be common in both and i want give priority to the records in File 2 and remove the same from File 1. The record from Type1 to the next Type1 forms one Record and the Record key is 12345, 67890, 12561 etc.
One more thing, can i do this using JCL (DFSORT) ?
Joined: 26 Nov 2002 Posts: 12376 Topics: 75 Location: San Jose
Posted: Mon May 09, 2005 12:29 pm Post subject:
Naren,
A couple of questions. I assume that these are header and detail records. Is there a field on Detail records that shows they are linked to the particular header?
ex:
Code:
Type1 asdcaf12345
Type2
Type3
Type4
Type5
How can you tell that type2,3,4,5 belong to the header record asdcaf12345 ?
If you can differentiate then it is very easy.
Please post the lrecl,recfm and position of the key
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Mon May 09, 2005 2:16 pm Post subject:
Naren,
Do the records actually have the strings 'Type1', 'Type2', etc in positions 1-5?
If not, what do the records really look like? _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 24 Dec 2002 Posts: 32 Topics: 6 Location: U.K
Posted: Tue May 10, 2005 7:22 am Post subject:
Hi Kolusu,
The LRECL is 90 and RECFM is VB. The position of the key record start from 25 and it's a combination of character and packed decimal format (total of 8 bytes). The Record Key struct look like
The Type 1 is the header record and the rest are detail records. The type 2,3,4,5 belong to type1 until the next type1 record. I am afraid there is no linking between the type1 and type2,3,4,5 etc.
But there is one thing (See the actual records are below) ....
The first two bytes looks like this if using hex values
As u can see if the record type is same, the hex values are also same and if the record types increase, so do the hex value like X'0502', X'0602' etc.
Frank,
The Records actually looks like this ...
(I had substitued them with type 1, type 2 etc ..for easy understanding)
Only the HRT contains the key and the rest are detail record (without the key).
The same repeats itself for a new record key.
_________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Tue May 10, 2005 2:24 pm Post subject:
Naren,
I think I could come up with a solution for this using DFSORT's new IFTHEN function. But the job I had you run shows me that you don't have the PTF with IFTHEN installed - DFSORT R14 PTF UQ95213 (Dec, 2004). So any solution I came up with along those lines wouldn't do you any good until you installed that PTF. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 04 May 2003 Posts: 92 Topics: 4 Location: Paris, France
Posted: Wed May 11, 2005 4:36 am Post subject:
Naren says
Code:
The record from Type1 to the next Type1 forms one Record
I'm not sure to well understand this.
I interpret this like if a group is present in file2 then the same group in file1 must be completely removed, then replaced by the one from file2; number of records from the 2 groups can differ.
If it's what you need, I made it 2 weeks ago to merge VM directories.
As Franks says, it's too tricky to elaborate something without the new DFSORT PTF installed.
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Fri May 20, 2005 12:02 pm Post subject:
Naren,
In your example of the input and output records, the groups with a match have the same number of records in input file1 and input file2 and those records are identical. That is, the asdcaf12345 group has five records (type1-5) in both files and the asdfgg67890 has six records (type1-6) in both files. Do the matching groups always have the same number of identical records, or can they have different records or a different number of records? For example, could the 'key1' group have a group record, a name record and two address records in input file1 and a group record, a name record, three address records and an occupation record in input file2? If so, what would you want the output for that group to look like?
When you say "Giving priority to records in File 2", do you mean that for a matching group, you want to remove all of the input1 records, and keep all of the input2 records, or do you want to do something else? This goes along with my question above about whether the groups can have different records or a different number of records in the two input files. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 24 Dec 2002 Posts: 32 Topics: 6 Location: U.K
Posted: Fri May 20, 2005 12:46 pm Post subject:
Frank,
It is not necessary that the matching groups will always have the same number of identical records as well as type of records. It can be different.
So when i say "Giving priority to records in File 2", for a matching group, I want to remove all of the input1 records, and keep all of the input2 records.
For e.g 'Key1' group in File1 can have 3 records (Root, name and address) and in File 2 it can have 5 records (Root, name, 2 address and occupation). So in my output file for key1 group 5 records need to be present.
Naren _________________ "Hold fast to dreams, for if dreams die, life is a broken winged bird that cannot fly."
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Fri May 20, 2005 2:55 pm Post subject:
Another question: For your output, you show the groups in the following key order:
asdcaf12345
jdfvnd12561
asdfgg67890
jdfvnd98765
Would it be ok to have the groups in their actual sorted key order, that is:
asdcaf12345
asdfgg67890
jdfvnd12561
jdfvnd98765
If not, what exactly are the rules for the order you want the keys sorted in? _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 24 Dec 2002 Posts: 32 Topics: 6 Location: U.K
Posted: Mon May 23, 2005 3:42 am Post subject:
Hi Frank,
If you see the 4th post from top, I have mentioned the key in the record ehich is
The position of the key record start from 25 and it's a combination of character and packed decimal format (total of 8 bytes). The Record Key struct look like
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Mon May 23, 2005 10:42 am Post subject:
Naren,
Yes, I understood what the key was and where it was. I was asking about the order of the output records according to the key. As I said, your output did not show the records ordered according to the key, but I'll take your statement that "I want it to be sorted on this key" to mean that the output records should be ordered according to the key.
I've figured out conceptually how to do this. Now I just need to find some time to put the actual job together. Hopefully, I'll be able to post the solution sometime today. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum