Joined: 08 Jan 2003 Posts: 28 Topics: 4 Location: india
Posted: Tue Dec 02, 2003 2:33 pm Post subject: Remove duplicate record to another file.
Hi All
I have a dataset having some duplicate records in them. Now I need to
1) sort the file
2) move part of the second duplicate record to first duplicate record
3) remove the second duplicate record from the file and put in another file.
I have to sort on SORT FIELDS=(1,20,A,CH,389,20,A,CH) .
In the second duplicate record the field having position (389,20) will contains some value but the first duplicate record will contain space. I want that value to be copied to the first duplicate record and then the second duplicate to be moved from this file and written in another file.
Trying doing some R&D with ICETOOL but not successfull so far.
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Tue Dec 02, 2003 3:40 pm Post subject:
Assuming you only have two dups per key, or you want to splice the first and last records for mulitple dups per key, here's a DFSORT/ICETOOL job that will do what you asked for. You'll need DFSORT R14 PTF UQ90053 (Feb, 2003) to use SPLICE:
Code:
//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=... input file
//OUT1 DD DSN=... dup1 with field from dup2
//OUT2 DD DSN=... dup2
//TOOLIN DD *
* SPLICE DUP2 FIELD TO MATCHING DUP1 RECORD
SPLICE FROM(IN) TO(OUT1) ON(1,20,CH) WITH(389,20)
* SELECT DUP2 RECORDS
SELECT FROM(IN) TO(OUT2) ON(1,20,CH) LASTDUP
/*
It wasn't clear if you wanted the non-dup records in OUT1 or not. If you do, just add KEEPNODUPS to the SPLICE operator:
Joined: 08 Jan 2003 Posts: 28 Topics: 4 Location: india
Posted: Tue Dec 02, 2003 3:47 pm Post subject:
Kolusu/Frank
Thanks for your answer and sorry for the confusion.
No the file cannot contain more than 1 duplicate and one of the duplicate will always contain blank and the other will have some data in the specified record area (389 - 408) .
LRECL = 1000 , RECFM = FB
Frank - Yes I want all the non-dup records in OUT1.
_________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Last edited by Frank Yaeger on Tue Dec 02, 2003 4:30 pm; edited 3 times in total
Joined: 08 Jan 2003 Posts: 28 Topics: 4 Location: india
Posted: Wed Dec 03, 2003 9:53 am Post subject:
There is a slight change of plan.
I want to sort the file on Sort will be on (1,20,CH,A,389,20,CH,A)
Then I want all the unique records as well as the second duplicate of the duplicate records to be in one file and the first duplicate of the dupilcate records to be in another file.
As told earlier , the file cannot contain more than 1 duplicate.
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Wed Dec 03, 2003 11:19 am Post subject:
Kolusu,
For SELECT with USING, you can only use the INCLUDE, OMIT, OUTFIL and OPTION statements, not the others. DFSORT's ICETOOL generates statements to pass to DFSORT and specifying those other statements to override the generated statements can mess things up (unless you know EXACTLY what you're doing). _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Wed Dec 03, 2003 11:36 am Post subject:
Bidpar,
For this new variation, it's not clear to me what you want in each output file. Do you still want to join the dup1 and dup2 fields? Please show me an example of what the input records look like and what you want the output files to look like. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 08 Jan 2003 Posts: 28 Topics: 4 Location: india
Posted: Wed Dec 03, 2003 11:53 am Post subject:
Frank
No I dont need to join these 2 fileds anymore. This requirement is much simpler.
Let me give some more clarification to make things simpler.
The key for this file is first 20 bytes. Since my input file is unsorted , I am first sorting by this key.
Now this file can contain duplicates. and it will occure maximum of 1 time in a set. That means I will get either an unique record or a pair of duplicates for any keyvalue.
The only field which will be alway different between these 2 duplicates is (389,20). One of the records from the duplicate pair will always have a blank on this field and the other one will always have some value.
That's why I am sorting it on this filed so that I can get the record with blank on this field first.
Now I want all the unique records and the second record from the pair of duplicates in one file.
If I sort the field (389,20) on descending , then I will need the first record from the pair along with all the unique records.
Then I need rest of the duplicate records in another file.
Let me know if you need more clarification . I will give some example.
Joined: 08 Jan 2003 Posts: 28 Topics: 4 Location: india
Posted: Wed Dec 03, 2003 1:01 pm Post subject:
Kolusu
I guess I am not clear yet.
If I sort with the combination of keys (1,20 and 389,20) and select based on them , then all of my records will be unique. Right ?
Duplicate will occur only if I sort on (1,20) . Then from the pair of duplicates I have to select the duplicate for which the field at (389,20) will contain some value (no space). Add them to all the unique records and put them in one single file.
_________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum