MVSFORUMS.com

videlord · Beginner Joined: 09 Dec 2004 Posts: 147 Topics: 19

I want remove duplicate reocord in a data set, but i want to keep the mapping of removed record. 1 input => 2 outputs, one is from input without duplication and seq number, the other is seqnum of original file and seq num in new file.
for example:

INPUT:
AAA
BBB
AAA
CCC
BBB

OUTPUT1:
AAA 01
BBB 02
CCC 03

OUTPUT2:
01 01
02 02
03 03
04 01
05 02

Is there an easy way to do it?
Thanks.

Charlie

Phantom · Posted: Fri Jan 07, 2005 6:56 am Post subject:

Kolusu,

The links you posted actually prints the no. of occurences of each duplicate key but Videlord is not looking for that. He wants to insert seqnum in the file and then eliminate the duplicates on the key portion and write to a file - say SORTOUT. Later, in the other file SORTXSUM which contains the eliminated records he wants to map each duplicate key to the seqnum of its first occurance (ie. the seqnum in SORTOUT).

Please correct me if I'm wrong.

Thanks,
Phantom

kolusu · Posted: Fri Jan 07, 2005 9:12 am Post subject:

Phantom,

You are right . I did not read the question properly. Sorry.

Videlord,

Your output2 is misleading. can you tell us more as to what your output should be ? especially 3rd and 4th record in the output2

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu

videlord · Beginner Joined: 09 Dec 2004 Posts: 147 Topics: 19

Kolusu,

Thanks.

The output2 is the file with the same number of records as input.
The first field is the seq# of input, the 2nd field the seq# in output1.
We can use table to describe my problem again:
input - TableA (COL1, SEQNO)
output1 - TableB (COL1, SEQNO)
output2 - TableC (SEQNO1,SEQNO2)
SEQNO in TableA and TableB are generated by database

TableB Data:
Insert into tableB (COL1)
(select distinct COL1 from TableA)

TableC Data:
Insert into tableC
(select TableA.SEQNO, TableB.SEQNO from TableA, TableB
where TableA.COL1 = TableB.COL2)

I hope it described my problem clearly.
I have one idea,
Step 1. copy input1 add seqno.
2. sort input1 add seqno keepnodups
3. Splice 2 & 1 generate the mapping
It seems some complex and not efficient.
Did you have any suggestion?

Thanks.

Charlie

Phantom · Posted: Sat Jan 08, 2005 5:58 am Post subject:

videlord,

I think you already gave the solution. I don't think this could be achieved using the traditional features of sort without using SPLICE. As you said, you can get the result in 1 Step - 2 Pass using ICETOOL / SYNCTOOL.

Frank Yaeger · Posted: Sat Jan 08, 2005 11:32 am Post subject:

videlord · Beginner Joined: 09 Dec 2004 Posts: 147 Topics: 19

Frank,

Oh, sorry, i think i make a mistake when i type the INPUT file, the 4th record of input should be AAA instead of CCC.

So your explaination of Output2 is correct.

Thanks.

Charlie

Frank Yaeger · Posted: Mon Jan 10, 2005 11:11 am Post subject:

videlord · Beginner Joined: 09 Dec 2004 Posts: 147 Topics: 19

Frank,

Hope the following example can clearly describe my question.

INPUT: (total 8 records, added the seq#)
AAA 001
BBB 002
CCC 003
AAA 004
DDD 005
EEE 006
BBB 007
FFF 008

OUT1: (6 records no dup)
AAA 001
BBB 002
CCC 003
DDD 004
EEE 005
FFF 006

OUT2: (6 records)
001 001
002 002
003 003
004 001
005 004
006 005
007 002
008 006

Frank Yaeger · Posted: Tue Jan 11, 2005 11:07 am Post subject:

Ok, now it makes sense. But it will require multiple passes. I don't see any easier way to do it.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

kolusu · Posted: Tue Jan 11, 2005 3:59 pm Post subject:

Videlord,

The following DFSORT/ICETOOL jcl will give you desired results. If your shop has syncsort then change the pgm name to synctool.

A brief explanation of the job. This is similar to the trick of sorting records with header and detail records explained in here.

http://www.mvsforums.com/helpboards/viewtopic.php?t=3432

The only difference here is that we do not have the Header. So we create a header for every record using sections and header3 parm.

The first operator (SORT) adds a seqnum and sorts on the key and creates a header for every key.

The second operator (COPY) takes this file and splits into 2 files viz.. header and detail records.

The third operator (SORT) concatenates these 2 files and sorts on the orginal seqnum and writes out the files.

Phantom · Posted: Wed Jan 12, 2005 1:07 am Post subject:

Kolusu,
Excellent Solution. Gr8 job.