MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

How to remove duplicate records by using ICETOOL for VB
Goto page 1, 2  Next
 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
karupps
Beginner


Joined: 18 May 2005
Posts: 11
Topics: 3

PostPosted: Wed May 18, 2005 4:42 am    Post subject: How to remove duplicate records by using ICETOOL for VB Reply with quote

I want to remove the dublicate records in the two files(if one record is present in both
the files dont write into output file).

All the files are Variable length record files. One record may have length of 70 and other record
may have length of 80,etc..


INFILE1
*******
Code:


00000000,20010912,00095044,4794A,HE,FRA,16,1618,C,20010912,1,,,, 
00000000,20010913,00095044,4794A,HE,FRA,1,2146,C,20010913,1,,,, 
00000000,20010916,00095044,4794A,HE,FRA,16,1804,C,20010916,19,,,, 
00000000,20010917,00095044,4794A,HE,FRA,16,2153,C,20010917,10,,,, 
00000000,20010918,00095044,4794A,HE,FRA,16,2153,C,20010917,25,,,, 

INFILE2
*******
Code:

00000000,20010912,00095044,4794A,HE,FRA,16,1618,C,20010912,1,,,, 
00000000,20010913,00095044,4794A,HE,FRA,1,2146,C,20010913,1,,,, 
00000000,20010916,00095044,4794A,HE,FRA,16,1804,C,20010916,19,,,, 
00000000,20010917,00095044,4794A,HE,FRA,16,2153,C,20010917,10,,,, 

I want OUTPUT FILE like this

OUTFILE
*******
Code:

00000000,20010918,00095044,4794A,HE,FRA,16,2153,C,20010917,25,,,,

I tried like this:

Code:

//STEP0010 EXEC PGM=ICETOOL,REGION=17M                           
//SYSOUT   DD SYSOUT=*                                           
//SYSUDUMP DD SYSOUT=*                                           
//TOOLMSG  DD SYSOUT=*                                           
//DFSMSG   DD SYSOUT=*                                           
//INFILE   DD INFILE1       
//         DD INFILE2         
//OUTFILE  DD OUTFILE         
//TOOLIN   DD *                                                               
   SELECT FROM(INFILE) TO(OUTFILE) ON(VLEN) NODUPS           
/*                                                               

But no records selected to OUTFILE.

I think there is problem is ON(VLEN).

But only option for me is to use ICETOOL


Please help me

Karupps
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12375
Topics: 75
Location: San Jose

PostPosted: Wed May 18, 2005 5:27 am    Post subject: Reply with quote

karupps,

What is the key to determine if the record is a duplicate? is it the entire record? (all the 70 bytes of infile1?)

Using VLEN as ON parm on SELECT parm will only eliminate dups which are of the same length. In your case infile1 and infile2 are of different lrecl, so you will never have a duplicate.

Kolusu

Ps : Please do NOT send emails seeking help. All questions are to be posted on helpboards only
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
karupps
Beginner


Joined: 18 May 2005
Posts: 11
Topics: 3

PostPosted: Wed May 18, 2005 5:40 am    Post subject: Reply with quote

Kolusu,

Thank you very much

In my case both the files are of Varibale length record. I want to remove the records of same record length.

In both files all the records have different record.

It is not possible to give ON(p,m,f) format, each record have diffrrent length.

For that only i tried vlen

If we are using VLEN , it should remove the records(duplicates) of same record length(whatever 67,68,69,70,...)


PLet me know still you are not getting

Thanks in advance

Karupps
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Wed May 18, 2005 10:25 am    Post subject: Reply with quote

Karupps,

What is the LRECL of your input file?
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12375
Topics: 75
Location: San Jose

PostPosted: Wed May 18, 2005 11:01 am    Post subject: Reply with quote

Frank,

I am guessing that all the records have full lrecl (VLTRIM off) , so OP was not able to eliminate the duplicate records on vlen.

I can only think of creating 2 temp files using VLTRIM on OUTFIL and concatenate these 2 temp files and then eliminate the dups on vlen

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Wed May 18, 2005 1:20 pm    Post subject: Reply with quote

Kolusu,

Using VLEN would not be a reliable way to remove duplicates as there isn't necessarily any correspondence between the length of the records and whether they are dups. Any number of records could have the same length.

Do you mean VLFILL rather than VLTRIM? VLFILL could be used to pad the short records out so they could be compared and then VLTRIM could be used to remove the fill characters. That's why I wanted to know the LRECL. I don't know what you mean by creating 2 temp files using VLTRIM - that would only remove a specific character at the end of the records - I don't see how that applies unless you used VLFILL first.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12375
Topics: 75
Location: San Jose

PostPosted: Wed May 18, 2005 1:26 pm    Post subject: Reply with quote

Frank,

I meant VLTRIM only. I am guessing that OP has the datasets with trailing spaces for all records. So Unless you remove the trailing spaces you don't get the exact the LRECL.

ie.

Code:

//STEP0100 EXEC PGM=SORT                                         
//SYSOUT   DD SYSOUT=*                                           
//SORTIN   DD *                                                 
A                                                               
AA                                                               
AAA                                                             
AAAA                                                             
AAAAA                                                           
AAAAAA                                                           
AAAAAAA                                                         
AAAAAAAA                                                         
AAAAAAAAA                                                       
AAAAAAAAAA                                                       
//SORTOUT  DD DSN=VB.FILE,DISP=(,CATLG),SPACE=(TRK,(1,1),RLSE)
//SYSIN    DD *                                                 
  SORT FIELDS=COPY                                               
  OUTFIL FTOV                                                   
/*


Now if you check the first 2 bytes of the above file , all the records will have a value of 84. If you use VLTRIM and remove the trailing spaces, then you will get the actual lrecl of each record.

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Wed May 18, 2005 2:16 pm    Post subject: Reply with quote

Kolusu,

Interesting assumption. But even if it's true that all of the records have trailing blanks and you remove them, how would just looking at the record length distinguish between:

AAA
AAA
BBB
CCC

All four will have a length of 7, but only the AAA records are duplicates. I don't see how just comparing the record lengths would ever give you an accurate check for duplicates?

My idea was to use VLFILL to pad out all the records to the LRECL with a character (e.g. X'FF') that doesn't appear in the data. Then you could compare the entire padded record to identify the duplicates. Then you could use VLTRIM to remove the pad character (X'FF').
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12375
Topics: 75
Location: San Jose

PostPosted: Wed May 18, 2005 3:15 pm    Post subject: Reply with quote

Frank,

I assumed that OP does not care about the contents. He only needs unique LRECL records irrespective of the contents on the records.

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Wed May 18, 2005 4:34 pm    Post subject: Reply with quote

Kolusu,

Oh, I see. Well, now that I read back through the posts, it's certainly not clear whether he wants to remove the records of the same length, or the records with duplicate content. He seems to ask for both in different posts. However, he states several times that the records are of different lengths, so if that's the case, I would think that his original job would do it. Those statements would contradict the assumption that all of the records are the same length. But then his posts are full of contradictions, so who knows.

Karupps,

If you're still interested in a solution, please tell us whether you want to

(1) eliminate records with the same length and different content. For example:

Code:

Input

 LL   data
|12|11111111|
|12|11111111|
|12|22222222|
|12|33333333|


would result in one output record:

Code:

 LL   data
|12|11111111|


since all of the records have a length of 12.

(2) eliminate records with the same length and the same content. For example:

Code:

Input

 LL  data
|12|11111111|
|12|11111111|
|12|22222222|
|12|33333333|


would result in three output records :

Code:

 LL  data
|12|11111111|
|12|22222222|
|12|33333333|


since the first two records have the same length and content, whereas the third and fourth records are unique.

Or if neither of those situations matches what you want, show us an example of what you do want.

Also, what's the LRECL of your input file?
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
karupps
Beginner


Joined: 18 May 2005
Posts: 11
Topics: 3

PostPosted: Thu May 19, 2005 4:20 am    Post subject: Reply with quote

Hi Kolusu & Frank,

Thank you very much for your all suggestionns.

I will tell my problem clearly:

My all the files are always VB.

EX. (LRECL - 256)

INFILE1:
00000000,20010912,00095044,4794A,HE,FRA,16,1618,C,20010912,1,10,5
00000000,20010913,00095044,4794A,HE,FRA,1,2146,C,20010913,1,,,
00000000,20010916,00095044,4794A,HE,FRA,16,1804,C,20010916,19,,,,

INFILE2:
00000000,20010912,00095044,4794A,HE,FRA,16,1618,C,20010912,1,10,5
00000000,20010913,00095044,4794A,HE,FRA,12,2146,C,20010913,12,5,,
00000000,20010916,00095044,4794A,HE,FRA,16,1804,C,20010916,19,,,,

In both the files first & third records are same , but there is diffrenece in second records.

My output file should be like this: (remove the first & third records)

OUTFILE

00000000,20010913,00095044,4794A,HE,FRA,1,2146,C,20010913,1,,,
00000000,20010913,00095044,4794A,HE,FRA,12,2146,C,20010913,12,5,,


I tried with ON(VLEN) options in SELECT but it is not selecting any records to OUTFILE.

Let me know still you are not getting my problem..

Karupps
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Thu May 19, 2005 10:38 am    Post subject: Reply with quote

Karupps,

Assuming that you want to compare the entire record, here's a DFSORT/ICETOOL job that will do what you asked for:

Code:

//S1    EXEC  PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//CON DD DSN=...  input file1 (VB/256)
//    DD DSN=...  input file2 (VB/256)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD DSN=...  output file (VB/256)
//TOOLIN DD *
* Pad VB records with trailing X'FF's.
  COPY FROM(CON) USING(CTL1)
* Select NODUPS for padded records.
  SELECT FROM(T1) TO(OUT) ON(1,256,BI) NODUPS USING(CTL2)
/*
//CTL1CNTL DD *
* Pad VB records with trailing X'FF's.
  OUTFIL FNAMES=T1,OUTREC=(1,256),VLFILL=X'FF'
/*
//CTL2CNTL DD *
* Remove trailing X'FF's.
  OUTFIL FNAMES=OUT,VLTRIM=X'FF'
/*

_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
karupps
Beginner


Joined: 18 May 2005
Posts: 11
Topics: 3

PostPosted: Mon May 23, 2005 4:18 am    Post subject: Thanks Reply with quote

Hi Frank & Kolusu,

Thank you very much for your help.

Now my problem solved

Thanks,
Karupps
Back to top
View user's profile Send private message
Mervyn
Moderator


Joined: 02 Dec 2002
Posts: 415
Topics: 6
Location: Hove, England

PostPosted: Mon Sep 26, 2005 8:52 am    Post subject: Reply with quote

I'm trying to use this code to check some larger files (LRECL 14834). Here's my JCL:

Code:

//STEP001  EXEC PGM=ICETOOL                                 
//TOOLMSG  DD SYSOUT=*                                       
//DFSMSG   DD SYSOUT=*                                       
//CON      DD DISP=SHR,DSN=INPUT.FILE(-1)                   
//         DD DISP=SHR,DSN=INPUT.FILE(0)                     
//OUT      DD DISP=(,CATLG,DELETE),DSN=OUTPUT.FILE,         
//         DATACLAS=DATAM8,SPACE=(CYL,(200,200),RLSE),       
//         DSORG=PS,RECFM=VB,LRECL=14834                     
//T1       DD DSN=&&T1,DISP=(,PASS),                         
//         DATACLAS=DATAM8,SPACE=(CYL,(200,200),RLSE)       
//SYSOUT   DD  SYSOUT=*                                     
//TOOLIN   DD *                                             
* PAD VB RECORDS WITH TRAILING X'FF'S.                       
  COPY FROM(CON) USING(CTL1)                                 
* SELECT NODUPS FOR PADDED RECORDS.                         
  SELECT FROM(T1) TO(OUT) ON(1,14834,BI) NODUPS USING(CTL2) 
/*                                                           
//CTL1CNTL DD *                                             
* PAD VB RECORDS WITH TRAILING X'FF'S.                       
  OUTFIL FNAMES=T1,OUTREC=(1,14834),VLFILL=X'FF'             
/*                                                           
//CTL2CNTL DD *                                             
* REMOVE TRAILING X'FF'S.                                   
  OUTFIL FNAMES=OUT,VLTRIM=X'FF'                             
/*             



I'm getting an error, though, RC=12.

Here's the sysout:

[code:1:706664e2a9]
ICE600I 0 DFSORT ICETOOL UTILITY RUN STARTED

ICE632I 0 SOURCE FOR ICETOOL STATEMENTS: TOOLIN


ICE630I 0 MODE IN EFFECT: STOP

* PAD VB RECORDS WITH TRAILING X'FF'S.
COPY FROM(CON) USING(CTL1)
ICE606I 0 DFSORT CALL 0001 FOR COPY FROM CON TO OUTFIL USING CTL1CNTL COMPLETED
ICE602I 0 OPERATION RETURN CODE: 00

* SELECT NODUPS FOR PADDED RECORDS.
SELECT FROM(T1) TO(OUT) ON(1,14834,BI) NODUPS USING(CTL2)

_________________
The day you stop learning the dinosaur becomes extinct


Last edited by Mervyn on Mon Sep 26, 2005 10:26 am; edited 2 times in total
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Mon Sep 26, 2005 9:01 am    Post subject: Reply with quote

Mervyn,

Your LRECL is the problem. There are upper limits for CH and BI. As far as I know syncsort does not support more than 4093 Chars for CH & BI. I think, this is the same with DFSORT. I am not sure if their latest version supports more than this.

Anyway, I don't think any sort products support 14,000 bytes at a stretch.

Hope this helps,

Thanks,
Phantom
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group