Joined: 26 Nov 2002 Posts: 12375 Topics: 75 Location: San Jose
Posted: Wed May 18, 2005 5:27 am Post subject:
karupps,
What is the key to determine if the record is a duplicate? is it the entire record? (all the 70 bytes of infile1?)
Using VLEN as ON parm on SELECT parm will only eliminate dups which are of the same length. In your case infile1 and infile2 are of different lrecl, so you will never have a duplicate.
Kolusu
Ps : Please do NOT send emails seeking help. All questions are to be posted on helpboards only _________________ Kolusu
www.linkedin.com/in/kolusu
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Wed May 18, 2005 10:25 am Post subject:
Karupps,
What is the LRECL of your input file? _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Wed May 18, 2005 1:20 pm Post subject:
Kolusu,
Using VLEN would not be a reliable way to remove duplicates as there isn't necessarily any correspondence between the length of the records and whether they are dups. Any number of records could have the same length.
Do you mean VLFILL rather than VLTRIM? VLFILL could be used to pad the short records out so they could be compared and then VLTRIM could be used to remove the fill characters. That's why I wanted to know the LRECL. I don't know what you mean by creating 2 temp files using VLTRIM - that would only remove a specific character at the end of the records - I don't see how that applies unless you used VLFILL first. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 26 Nov 2002 Posts: 12375 Topics: 75 Location: San Jose
Posted: Wed May 18, 2005 1:26 pm Post subject:
Frank,
I meant VLTRIM only. I am guessing that OP has the datasets with trailing spaces for all records. So Unless you remove the trailing spaces you don't get the exact the LRECL.
Now if you check the first 2 bytes of the above file , all the records will have a value of 84. If you use VLTRIM and remove the trailing spaces, then you will get the actual lrecl of each record.
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Wed May 18, 2005 2:16 pm Post subject:
Kolusu,
Interesting assumption. But even if it's true that all of the records have trailing blanks and you remove them, how would just looking at the record length distinguish between:
AAA
AAA
BBB
CCC
All four will have a length of 7, but only the AAA records are duplicates. I don't see how just comparing the record lengths would ever give you an accurate check for duplicates?
My idea was to use VLFILL to pad out all the records to the LRECL with a character (e.g. X'FF') that doesn't appear in the data. Then you could compare the entire padded record to identify the duplicates. Then you could use VLTRIM to remove the pad character (X'FF'). _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Wed May 18, 2005 4:34 pm Post subject:
Kolusu,
Oh, I see. Well, now that I read back through the posts, it's certainly not clear whether he wants to remove the records of the same length, or the records with duplicate content. He seems to ask for both in different posts. However, he states several times that the records are of different lengths, so if that's the case, I would think that his original job would do it. Those statements would contradict the assumption that all of the records are the same length. But then his posts are full of contradictions, so who knows.
Karupps,
If you're still interested in a solution, please tell us whether you want to
(1) eliminate records with the same length and different content. For example:
Code:
Input
LL data
|12|11111111|
|12|11111111|
|12|22222222|
|12|33333333|
would result in one output record:
Code:
LL data
|12|11111111|
since all of the records have a length of 12.
(2) eliminate records with the same length and the same content. For example:
Code:
Input
LL data
|12|11111111|
|12|11111111|
|12|22222222|
|12|33333333|
would result in three output records :
Code:
LL data
|12|11111111|
|12|22222222|
|12|33333333|
since the first two records have the same length and content, whereas the third and fourth records are unique.
Or if neither of those situations matches what you want, show us an example of what you do want.
Also, what's the LRECL of your input file? _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Thu May 19, 2005 10:38 am Post subject:
Karupps,
Assuming that you want to compare the entire record, here's a DFSORT/ICETOOL job that will do what you asked for:
Code:
//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//CON DD DSN=... input file1 (VB/256)
// DD DSN=... input file2 (VB/256)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD DSN=... output file (VB/256)
//TOOLIN DD *
* Pad VB records with trailing X'FF's.
COPY FROM(CON) USING(CTL1)
* Select NODUPS for padded records.
SELECT FROM(T1) TO(OUT) ON(1,256,BI) NODUPS USING(CTL2)
/*
//CTL1CNTL DD *
* Pad VB records with trailing X'FF's.
OUTFIL FNAMES=T1,OUTREC=(1,256),VLFILL=X'FF'
/*
//CTL2CNTL DD *
* Remove trailing X'FF's.
OUTFIL FNAMES=OUT,VLTRIM=X'FF'
/*
_________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 415 Topics: 6 Location: Hove, England
Posted: Mon Sep 26, 2005 8:52 am Post subject:
I'm trying to use this code to check some larger files (LRECL 14834). Here's my JCL:
Code:
//STEP001 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//CON DD DISP=SHR,DSN=INPUT.FILE(-1)
// DD DISP=SHR,DSN=INPUT.FILE(0)
//OUT DD DISP=(,CATLG,DELETE),DSN=OUTPUT.FILE,
// DATACLAS=DATAM8,SPACE=(CYL,(200,200),RLSE),
// DSORG=PS,RECFM=VB,LRECL=14834
//T1 DD DSN=&&T1,DISP=(,PASS),
// DATACLAS=DATAM8,SPACE=(CYL,(200,200),RLSE)
//SYSOUT DD SYSOUT=*
//TOOLIN DD *
* PAD VB RECORDS WITH TRAILING X'FF'S.
COPY FROM(CON) USING(CTL1)
* SELECT NODUPS FOR PADDED RECORDS.
SELECT FROM(T1) TO(OUT) ON(1,14834,BI) NODUPS USING(CTL2)
/*
//CTL1CNTL DD *
* PAD VB RECORDS WITH TRAILING X'FF'S.
OUTFIL FNAMES=T1,OUTREC=(1,14834),VLFILL=X'FF'
/*
//CTL2CNTL DD *
* REMOVE TRAILING X'FF'S.
OUTFIL FNAMES=OUT,VLTRIM=X'FF'
/*
I'm getting an error, though, RC=12.
Here's the sysout:
[code:1:706664e2a9]
ICE600I 0 DFSORT ICETOOL UTILITY RUN STARTED
ICE632I 0 SOURCE FOR ICETOOL STATEMENTS: TOOLIN
ICE630I 0 MODE IN EFFECT: STOP
* PAD VB RECORDS WITH TRAILING X'FF'S.
COPY FROM(CON) USING(CTL1)
ICE606I 0 DFSORT CALL 0001 FOR COPY FROM CON TO OUTFIL USING CTL1CNTL COMPLETED
ICE602I 0 OPERATION RETURN CODE: 00
* SELECT NODUPS FOR PADDED RECORDS.
SELECT FROM(T1) TO(OUT) ON(1,14834,BI) NODUPS USING(CTL2)
_________________ The day you stop learning the dinosaur becomes extinct
Last edited by Mervyn on Mon Sep 26, 2005 10:26 am; edited 2 times in total
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
Posted: Mon Sep 26, 2005 9:01 am Post subject:
Mervyn,
Your LRECL is the problem. There are upper limits for CH and BI. As far as I know syncsort does not support more than 4093 Chars for CH & BI. I think, this is the same with DFSORT. I am not sure if their latest version supports more than this.
Anyway, I don't think any sort products support 14,000 bytes at a stretch.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum