Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Wed Nov 30, 2005 10:11 am Post subject: SORT - How to Eliminate Duplicates ? |
|
|
How can I eliminate duplicate records from a dataset ?
Here are few simple ways to do this.
Case # 1: Eliminate dups on a key field
Solution 1: Using SORT The Simplest way
Code: |
//* key_pos - refers to the Key field starting position *
//* key_len - refers the length of key field *
//* format - the character format (CH, BI, ZD, ....) *
//* order - the sort order - Ascending / Descending *
//R010 EXEC PGM=SORT
//SORTIN DD DSN=my.input.file,DISP=SHR
//SORTOUT DD DSN=my.output.file.nodups,DISP=OLD
//SYSOUT DD SYSOUT=*
//SYSIN DD *
SORT FIELDS=(key_pos, key_len, format, order)
SUM FIELDS=NONE
/*
|
PS: If you want to eliminate dups on the entire record change the key_pos to 1 and key_len to LRECL (1 to nnnn where nnnn = LRECL)
http://www.mvsforums.com/helpboards/viewtopic.php?t=4817&highlight=eliminate+duplicates
Solution 2: Using DFSORT's ICETOOL - Change the PGM to SYNCTOOL if you are using Syncsort
Code: |
//* *
//* This code removes dups from 1 - 50 bytes *
//* *
//R010 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=my.input.file,DISP=SHR
//OUT DD DSN=my.output.file.nodups,DISP=OLD
//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,50,CH) FIRST
/*
|
Case # 2: Remove dups and capture the duplicate entries into a different dataset
Solution # 1: Using DFSORT's ICETOOL
Code: |
//R010 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=my.input.file,DISP=SHR
//OUT DD DSN=my.output.file.nodups,DISP=OLD
//SAVEREST DD DSN=my.output.file.dup,DISP=OLD
//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,50,CH) FIRST DISCARD(SAVEREST)
/*
|
Solution # 2: Using SYNCSORT - The XSUM keyword captures the rejected duplicate entries into a dataset allocated to DD name "SORTXSUM".
Code: |
//STEP0100 EXEC PGM=SORT
//SORTIN DD DSN=my.input.file,DISP=SHR
//SORTOUT DD DSN=my.output.file.nodups,DISP=OLD
//SORTXSUM DD DSN=my.output.dup,DISP=OLD
//SYSIN DD *
SORT FIELDS=(1,20,CH,A)
SUM FIELDS=NONE,XSUM
/*
|
Thanks,
Phantom |
|