MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Remove duplicate record to another file.

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
bidpar
Beginner


Joined: 08 Jan 2003
Posts: 28
Topics: 4
Location: india

PostPosted: Tue Dec 02, 2003 2:33 pm    Post subject: Remove duplicate record to another file. Reply with quote

Hi All
I have a dataset having some duplicate records in them. Now I need to
1) sort the file
2) move part of the second duplicate record to first duplicate record
3) remove the second duplicate record from the file and put in another file.

I have to sort on SORT FIELDS=(1,20,A,CH,389,20,A,CH) .
In the second duplicate record the field having position (389,20) will contains some value but the first duplicate record will contain space. I want that value to be copied to the first duplicate record and then the second duplicate to be moved from this file and written in another file.

Trying doing some R&D with ICETOOL but not successfull so far.

Thanks
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12376
Topics: 75
Location: San Jose

PostPosted: Tue Dec 02, 2003 3:25 pm    Post subject: Reply with quote

Bidpar,

Can you clarify the following.

1. what is the LRECL, RECFM of the file?
2. Can the file contain more than 1 duplicate? .i.e take a look at the following data.

Code:

AAAAA               REC1
AAAAA    0010       REC2
BBBBB               REC1
BBBBB    1111       REC2


Now If I understand correctly you want the output to be
Code:

AAAAA    0010   REC1
BBBBB    1111   REC1


The duplicate file should have
Code:

AAAAA    0010   REC2
BBBBB    1111   REC4


is that right?

What happens if the data is like this?

Code:

AAAAA               REC1
AAAAA    0010       REC2
AAAAA    0020       REC3



Please show us a sample of input and desired output.

Thanks

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Tue Dec 02, 2003 3:40 pm    Post subject: Reply with quote

Assuming you only have two dups per key, or you want to splice the first and last records for mulitple dups per key, here's a DFSORT/ICETOOL job that will do what you asked for. You'll need DFSORT R14 PTF UQ90053 (Feb, 2003) to use SPLICE:

Code:

//S1    EXEC  PGM=ICETOOL                                 
//TOOLMSG   DD  SYSOUT=*                                   
//DFSMSG    DD  SYSOUT=*                                   
//IN DD DSN=...  input file                           
//OUT1 DD DSN=...  dup1 with field from dup2                                     
//OUT2 DD DSN=...  dup2                                                 
//TOOLIN DD *                                             
* SPLICE DUP2 FIELD TO MATCHING DUP1 RECORD               
  SPLICE FROM(IN) TO(OUT1) ON(1,20,CH) WITH(389,20)       
* SELECT DUP2 RECORDS                                     
  SELECT FROM(IN) TO(OUT2) ON(1,20,CH) LASTDUP             
/*


It wasn't clear if you wanted the non-dup records in OUT1 or not. If you do, just add KEEPNODUPS to the SPLICE operator:

Code:


  SPLICE FROM(IN) TO(OUT1) ON(1,20,CH) WITH(389,20) KEEPNODUPS


For complete information on the DFSORT/ICETOOL SPLICE operator, see:

http://www.storage.ibm.com/software/sort/mvs/uq90053/online/srtmutol.html#spl
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
bidpar
Beginner


Joined: 08 Jan 2003
Posts: 28
Topics: 4
Location: india

PostPosted: Tue Dec 02, 2003 3:47 pm    Post subject: Reply with quote

Kolusu/Frank
Thanks for your answer and sorry for the confusion.

No the file cannot contain more than 1 duplicate and one of the duplicate will always contain blank and the other will have some data in the specified record area (389 - 408) .

LRECL = 1000 , RECFM = FB

Frank - Yes I want all the non-dup records in OUT1.

Let me know if you want more info.

Thanks
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Tue Dec 02, 2003 4:24 pm    Post subject: Reply with quote

Bidpar,

You can use the DFSORT/ICETOOL job with KEEPNODUPS, that is:

Code:

//S1    EXEC  PGM=ICETOOL
//TOOLMSG   DD  SYSOUT=*
//DFSMSG    DD  SYSOUT=*
//IN DD DSN=... input file
//OUT1 DD DSN=... dup1 2/dup2 field, nondups
//OUT2 DD DSN=... dup2
//TOOLIN DD *
* SPLICE DUP2 FIELD TO MATCHING DUP1 RECORD
  SPLICE FROM(IN) TO(OUT1) ON(1,20,CH) WITH(389,20) -
    KEEPNODUPS
* SELECT DUP2 RECORDS
  SELECT FROM(IN) TO(OUT2) ON(1,20,CH) LASTDUP
/*

_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort


Last edited by Frank Yaeger on Tue Dec 02, 2003 4:30 pm; edited 3 times in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
bidpar
Beginner


Joined: 08 Jan 2003
Posts: 28
Topics: 4
Location: india

PostPosted: Tue Dec 02, 2003 4:26 pm    Post subject: Reply with quote

Frank

Works great.

Thanks everybody.

Regards
Bidpar
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12376
Topics: 75
Location: San Jose

PostPosted: Wed Dec 03, 2003 9:50 am    Post subject: Reply with quote

Frank,

I have a question regarding SELECT with using parm.Can I have something like this?

Code:

SELECT FROM(IN) ON(1,4,CH) TO(OUT) DISCARD(DUPS) FIRSTDUP USING(CTL1)


Code:

//CTL1CNTL DD *
INREC FIELDS=(1,9,10,4,CHANGE=(4,C'    ',C'0000'),NOMATCH=(10,4), 
              80:X)                                               
SORT FIELDS=(1,5,CH,A)                                             
SUM FIELDS=(10,4,ZD)                                               
OUTFIL FNAMES=OUT,OUTREC=(1,9,10,4,CHANGE=(4,C'0000',C'    '),NOMATCH=(10,4),
              80:X)                                               
OUTFIL FNAMES=DUPS,OUTREC=(1,80)   



Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
bidpar
Beginner


Joined: 08 Jan 2003
Posts: 28
Topics: 4
Location: india

PostPosted: Wed Dec 03, 2003 9:53 am    Post subject: Reply with quote

There is a slight change of plan.

I want to sort the file on Sort will be on (1,20,CH,A,389,20,CH,A)
Then I want all the unique records as well as the second duplicate of the duplicate records to be in one file and the first duplicate of the dupilcate records to be in another file.
As told earlier , the file cannot contain more than 1 duplicate.

How can I do this.

Regards
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Wed Dec 03, 2003 11:19 am    Post subject: Reply with quote

Kolusu,

For SELECT with USING, you can only use the INCLUDE, OMIT, OUTFIL and OPTION statements, not the others. DFSORT's ICETOOL generates statements to pass to DFSORT and specifying those other statements to override the generated statements can mess things up (unless you know EXACTLY what you're doing).
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Wed Dec 03, 2003 11:36 am    Post subject: Reply with quote

Bidpar,

For this new variation, it's not clear to me what you want in each output file. Do you still want to join the dup1 and dup2 fields? Please show me an example of what the input records look like and what you want the output files to look like.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
bidpar
Beginner


Joined: 08 Jan 2003
Posts: 28
Topics: 4
Location: india

PostPosted: Wed Dec 03, 2003 11:53 am    Post subject: Reply with quote

Frank
No I dont need to join these 2 fileds anymore. This requirement is much simpler.

Let me give some more clarification to make things simpler.

The key for this file is first 20 bytes. Since my input file is unsorted , I am first sorting by this key.

Now this file can contain duplicates. and it will occure maximum of 1 time in a set. That means I will get either an unique record or a pair of duplicates for any keyvalue.

The only field which will be alway different between these 2 duplicates is (389,20). One of the records from the duplicate pair will always have a blank on this field and the other one will always have some value.
That's why I am sorting it on this filed so that I can get the record with blank on this field first.

Now I want all the unique records and the second record from the pair of duplicates in one file.

If I sort the field (389,20) on descending , then I will need the first record from the pair along with all the unique records.

Then I need rest of the duplicate records in another file.

Let me know if you need more clarification . I will give some example.

Thanks
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12376
Topics: 75
Location: San Jose

PostPosted: Wed Dec 03, 2003 12:24 pm    Post subject: Reply with quote

Bidpar,
The following DFSORT/ICETOOL JCl will give you the desired results.
Code:

//STEP0100 EXEC PGM=ICETOOL
//TOOLMSG  DD SYSOUT=*     
//DFSMSG   DD SYSOUT=*     
//IN       DD DSN=YOUR INPUT DSN,
//            DISP=SHR
//T1       DD DISP=&T1,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)
//OUT      DD DISP=YOUR OUTPUT FILE,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//DUPS     DD DISP=YOUR DUPLICATE FILE,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//TOOLIN   DD *
  SORT FROM(IN) TO(OUT) USING(CTL1)
  SELECT FROM(OUT) TO(DUPS) ON(1,20,CH) FIRSTDUP
//CTL1CNTL DD *
  SORT FIELDS=(1,20,CH,A,389,20,CH,A)
/*                                                             


Hope this helps...

cheers

kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
bidpar
Beginner


Joined: 08 Jan 2003
Posts: 28
Topics: 4
Location: india

PostPosted: Wed Dec 03, 2003 1:01 pm    Post subject: Reply with quote

Kolusu
I guess I am not clear yet.

If I sort with the combination of keys (1,20 and 389,20) and select based on them , then all of my records will be unique. Right ?

Duplicate will occur only if I sort on (1,20) . Then from the pair of duplicates I have to select the duplicate for which the field at (389,20) will contain some value (no space). Add them to all the unique records and put them in one single file.

Put rest of the duplicates on another file.

Ex:-
Input file -
1----------20 389 408
1111111111
2222222222 rrrrrrrrrrrrrrr
1111111111 ggggggggggg
4444444444
3333333333 qqqqqqqqqqq
3333333333
7777777777
2222222222
9999999999 eeeeeeeeeeee

Out1 -

1111111111 ggggggggggg
2222222222 rrrrrrrrrrrrrrr
3333333333 qqqqqqqqqqq
4444444444
7777777777
9999999999 eeeeeeeeeeee

Out2 -
1111111111
2222222222
3333333333


All the records those will be in OUT2 file will have spaces in the field 389,20.

Regards
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Wed Dec 03, 2003 3:56 pm    Post subject: Reply with quote

Bidpar,

This DFSORT/ICETOOL job will do what you asked for:

Code:

//S1    EXEC  PGM=ICETOOL                                         
//TOOLMSG   DD  SYSOUT=*                                         
//DFSMSG    DD  SYSOUT=*                                         
//IN DD DSN=...  input file                                 
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)       
//OUT1 DD DSN=...  output file1                                         
//OUT2 DD DSN=...  output file2                                               
//TOOLIN DD *                                                     
  SORT FROM(IN) TO(T1) USING(CTL1)                               
  SELECT FROM(T1) TO(OUT1) ON(1,20,CH) FIRST DISCARD(OUT2)
/*       
//CTL1CNTL DD *                                                   
  SORT FIELDS=(1,20,CH,A,389,20,CH,D)                             
/*

_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
bidpar
Beginner


Joined: 08 Jan 2003
Posts: 28
Topics: 4
Location: india

PostPosted: Wed Dec 03, 2003 5:07 pm    Post subject: Reply with quote

This is what I was looking for.

Thanks Frank and Kolusu

Regards
Bidpar
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group