View previous topic :: View next topic |
Author |
Message |
raj_m Beginner
Joined: 24 May 2005 Posts: 5 Topics: 2
|
Posted: Mon Sep 12, 2005 11:06 am Post subject: Handling Duplicates using Syncsort |
|
|
I have two file that needs to be compared against to get the desired output using Syncsort
Code: |
Case 1
Input file:
key1 key2 key3
111 001 123
111 001 123
111 001 123
111 001 999
222 002 234
222 002 345
333 003 123
My desired output
222 002 234
222 002 345
333 003 123
Case 2
Input file:
key1 key2 key3
111 001 123
111 001 123
111 001 123
222 002 234
222 002 345
333 003 123
My desired output
111 001 123
222 002 234
222 002 345
333 003 123
|
in brief what i should be doing is
* SORT based on KEY1, KEY2, KEY3 and eliminate duplicates
* THEN Sort based on KEY1 and KEY2 again and
IF FOUND DUPLICATES eliminate all records with matching KEY1 and KEY2 entries
Else Retain the unique entry.
This is the master file and I need to do similar processing for transaction file and using these two files i need to split using SPLICE
A. Match in File A & File B
B. In File A but not in File B
C. In File B but not in File A
I would be able to do it in multiple sort steps, but wanted to find out if this can be done in an efficient way?
Any help would be greatly appreciated
Thanx in Advance
Raj |
|
Back to top |
|
|
kolusu Site Admin
Joined: 26 Nov 2002 Posts: 12375 Topics: 75 Location: San Jose
|
Posted: Mon Sep 12, 2005 11:21 am Post subject: |
|
|
Quote: |
* SORT based on KEY1, KEY2, KEY3 and eliminate duplicates
* THEN Sort based on KEY1 and KEY2 again and
IF FOUND DUPLICATES eliminate all records with matching KEY1 and KEY2 entries
Else Retain the unique entry.
|
Raj_m,
hmm why do you need to sort first on key1, key2 and key3 and then second sort just key1 and key2?
You can achieve that just by sorting on key1 and key2 and eliminate dups
Code: |
//STEP0100 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD *
111 001 123
111 001 123
111 001 123
111 001 999
222 002 234
222 002 345
333 003 123
//SORTOUT DD SYSOUT=*
//SYSIN DD *
OPTION EQUALS $ ENSURES TO PICK THE FIRST REC
SORT FIELDS=(1,3,CH,A, $ SORT ON KEY1
6,3,CH,A) $ SORT ON KEY2
SUM FIELDS=NONE $ ELIMINATE DUPS
/*
|
The output from the above step is:
Code: |
111 001 123
222 002 234
333 003 123
|
Quote: |
This is the master file and I need to do similar processing for transaction file and using these two files i need to split using SPLICE
A. Match in File A & File B
B. In File A but not in File B
C. In File B but not in File A
|
Did you check this topic? See the solution posted by me
http://www.mvsforums.com/helpboards/viewtopic.php?t=4723&highlight=joinkeys
Hope this helps...
Cheers _________________ Kolusu
www.linkedin.com/in/kolusu |
|
Back to top |
|
|
raj_m Beginner
Joined: 24 May 2005 Posts: 5 Topics: 2
|
Posted: Mon Sep 12, 2005 11:52 am Post subject: |
|
|
Kolusu,
That does not give me the required output for CASE1. That's probably the reason why i need a sort on KEY1 KEY2 and KEY3 first.
Let me explain
Code: |
Case 1
Input file:
key1 key2 key3
111 001 123
111 001 123
111 001 123
111 001 999
222 002 234
222 002 345
333 003 123
My desired output
222 002 234
222 002 345
333 003 123
|
Sorting based on KEY1 KEY2 KEY3 would give me
Code: |
111 001 123
111 001 999
222 002 234
222 002 345
333 003 123
|
NOW record 1 and 2 are duplicates if we consider KEY1 and KEY2. IF i find duplicated I need to eliminate all the duplicated record entries(NOT UNIQUE OUT). in this case record1 and record 2 should not be in the output
Code: |
My desired output
222 002 234
222 002 345
333 003 123
|
For case2 since the record1 is unique based on KEY1 and KEY2 I need to retain this in the output.
Code: |
111 001 123
222 002 234
222 002 345
333 003 123
|
Thanks
Raj |
|
Back to top |
|
|
kolusu Site Admin
Joined: 26 Nov 2002 Posts: 12375 Topics: 75 Location: San Jose
|
Posted: Mon Sep 12, 2005 12:02 pm Post subject: |
|
|
Quote: |
Sorting based on KEY1 KEY2 KEY3 would give me
Code:
111 001 123
111 001 999
222 002 234
222 002 345
333 003 123
NOW record 1 and 2 are duplicates if we consider KEY1 and KEY2. IF i find duplicated I need to eliminate all the duplicated record entries(NOT UNIQUE OUT). in this case record1 and record 2 should not be in the output
Code:
My desired output
222 002 234
222 002 345
333 003 123
|
Raj_m,
Something still does not seem right , in your final output why did you pick the record? It is also a a duplicate considering key1 and key2. You eliminated
Code: |
111 001 123
111 001 999
|
based on the codition that they are duplicates on key1 and key2 . why doesn't the same rule apply to 222 record?
Kolusu _________________ Kolusu
www.linkedin.com/in/kolusu |
|
Back to top |
|
|
raj_m Beginner
Joined: 24 May 2005 Posts: 5 Topics: 2
|
Posted: Mon Sep 12, 2005 12:07 pm Post subject: |
|
|
Im sorry for the confusionKolusu.. you are right. even that need to be eliminated.
I was just keying in some dummy values..missed out on that. |
|
Back to top |
|
|
kolusu Site Admin
Joined: 26 Nov 2002 Posts: 12375 Topics: 75 Location: San Jose
|
|
Back to top |
|
|
|
|