MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Delete Duplicates

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Database
View previous topic :: View next topic  
Author Message
vini
Intermediate


Joined: 12 Jan 2004
Posts: 240
Topics: 48
Location: Maryland

PostPosted: Fri Jan 30, 2004 9:05 am    Post subject: Delete Duplicates Reply with quote

Hi ,

Can anyone tell me what will be the SQL like to Delete alll but One Duplicate Records in a Table ? If not SQL , then is there any other method to achieve the same ?

If following is the Data in a Table for Col1, Col2, Col3
x, y, z
x ,y, z
x, y, z
a, b, c
a, b, c
l, m, n

After the Delete it should be
x, y, z
a, b, c
l, m, n

Actually a friend asked me this one ..

Thnks.
Vini.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Fri Jan 30, 2004 9:50 am    Post subject: Reply with quote

Vini,

Deleting the duplicates retaining the first record is not that easy with sql. However there are many other ways to delete the rows.

1. Unload the table and run a utility(SORT) to remove duplicate records and reload it again
2. code a cobol pgm which will open cursor and perform the logic of determing the duplicate and delete the record.
3. Generate sql statements using easytrieve or any other utitlity to delete the duplicate records.

Hope this helps...

Cheers,

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Fri Jan 30, 2004 10:17 am    Post subject: Reply with quote

Vini,

Try this sql

Code:

DELETE                                                         
FROM TABLE A                                                     
WHERE (A.COL1,A.COL2,A.COL3) IN (SELECT B.COL1,B.COL2,B.COL3   
                                   FROM TABLE B                       
                                  WHERE A.COL1 = B.COL1             
                                    AND A.COL2 = B.COL2             
                                    AND A.COL3 = B.COL3             
                                  GROUP BY B.COL1, B.COL2, B.COL3   
                                 HAVING COUNT(*) > 1);


Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
vini
Intermediate


Joined: 12 Jan 2004
Posts: 240
Topics: 48
Location: Maryland

PostPosted: Fri Jan 30, 2004 11:29 am    Post subject: Reply with quote

kolusu ,
I think that SQL will eliminate allll duplicates and only leave those Records which did not have any duplicates ..
i.e it would leave
l,m,n

vini.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Mon Feb 02, 2004 6:49 am    Post subject: Reply with quote

vini,


In your first post you said
Quote:

After the Delete it should be
x, y, z
a, b, c
l, m, n


The query posted above gieves you the exact output as you wanted.

Now in your later post you said

Quote:

I think that SQL will eliminate allll duplicates and only leave those Records which did not have any duplicates ..
i.e it would leave
l,m,n



What exactly do you want? Question Do you want your output as mentioned in the first post or the last post?

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
vini
Intermediate


Joined: 12 Jan 2004
Posts: 240
Topics: 48
Location: Maryland

PostPosted: Mon Feb 02, 2004 5:16 pm    Post subject: Reply with quote

kolusu,

I wanted to say that I think "l,m,n" would be the result of the Query as suggested by you. The output I want is still as the first post.

However , I could not run that Query for real as I could not find a Table I could easily create some test data in without affecting the other developers testing or involving the dba. On second thoughts I think I can use the DEPT sample db2 table Idea . Most probably noone should have problem with that.

That must work if you have run it at your end. Its just that when I pondered on it .. I could not figure out how that could get the desired result Confused because
HAVING COUNT(*) > 1
should return all Rows which are Duplicates and the Delete would then eliminate all of these... that would leave behind only those rows which originally had no duplciates.

Thnks
Vini
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12369
Topics: 75
Location: San Jose

PostPosted: Tue Feb 03, 2004 6:48 am    Post subject: Reply with quote

vini,

The query I posted earlier will give you the following results.
Code:

x, y, z
a, b, c
l, m, n


But if you just wanted l,m,n then the following query will give you the desired results.

Code:

DELETE                                                       
FROM TABLE                                                   
WHERE (COL1,COL2,COL3) IN (SELECT COL1,COL2,COL3       
                             FROM TABLE                       
                             GROUP BY COL1,COL2,COL3         
                             HAVING COUNT(*) > 1)             
                             ;                               


Here is a tip to test the sql queries. Assuming that you have authority to create tables in your assigned tablespace.

Code:

CREATE TABLE VINI                     
     (COL1               CHAR(01),             
      COL2               CHAR(01),             
      COL3               CHAR(01)             
     )                                         
IN XXXXXX.USERTS                             
;                                             
INSERT INTO VINI VALUES('X','Y','Z');         
INSERT INTO VINI VALUES('X','Y','Z');         
INSERT INTO VINI VALUES('X','Y','Z');         
INSERT INTO VINI VALUES('A','B','C');         
INSERT INTO VINI VALUES('A','B','C');         
INSERT INTO VINI VALUES('L','M','N');         
SELECT * FROM VINI;                           


now I do all my testing on this table. Once I am done with testing I just drop the table.

Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu


Last edited by kolusu on Tue Feb 03, 2004 5:09 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
vini
Intermediate


Joined: 12 Jan 2004
Posts: 240
Topics: 48
Location: Maryland

PostPosted: Tue Feb 03, 2004 4:35 pm    Post subject: Reply with quote

kolusu,

Here developers dont have authority to create tables ..besides even DEPT did not help got -536 .

Finally took a small sized infrequently used table in Test region and both the Queries provided by you worked PERFECT !!!

I still have to figure them out though.. for my own understanding i.e. Smile

Thnks
vini
Back to top
View user's profile Send private message
danm
Intermediate


Joined: 29 Jun 2004
Posts: 170
Topics: 73

PostPosted: Thu Oct 07, 2004 11:13 am    Post subject: Reply with quote

Kolusu,

Your first query did give the correct result. But I need some explanation on how it works.
Code:

I created the VINI Table:

COL1  COL2  COL3
----  ----  ----
X     Y     Z
X     Y     Z
X     Y     Z
A     B     C
A     B     C
L     M     N

I ran the SELECT command to see which rows will be deleted prior to issue the DELETE command:
SELECT * FROM VINI A
WHERE (A.COL1,A.COL2,A.COL3) IN (SELECT B.COL1,B.COL2,B.COL3
                                 FROM VINI B
                                 WHERE A.COL1 = B.COL1
                                    AND A.COL2 = B.COL2
                                    AND A.COL3 = B.COL3
                                 GROUP BY B.COL1, B.COL2, B.COL3
                                 HAVING COUNT(*) > 1);

I got these rows:
COL1  COL2  COL3
----  ----  ----
X     Y     Z
X     Y     Z
X     Y     Z
A     B     C
A     B     C

When I change SELECT to DELETE, I expected the above 5 rows be deleted.

DELETE FROM VINI A
WHERE (A.COL1,A.COL2,A.COL3) IN (SELECT B.COL1,B.COL2,B.COL3
                                 FROM VINI B
                                 WHERE A.COL1 = B.COL1
                                    AND A.COL2 = B.COL2
                                    AND A.COL3 = B.COL3
                                 GROUP BY B.COL1, B.COL2, B.COL3
                                 HAVING COUNT(*) > 1);

3 rows (not 5 rows I expected) are deleted to give the correct result:
COL1  COL2  COL3
----  ----  ----
X     Y     Z
A     B     C
L     M     N

What do I miss?
Back to top
View user's profile Send private message
cp
Beginner


Joined: 21 Oct 2005
Posts: 6
Topics: 1

PostPosted: Tue Oct 25, 2005 6:46 am    Post subject: Reply with quote

Hi Kolusu,

I have tested your query given in Post-3(You have mentioned here that the query will keep one row for each group of duplicates).But it is deleting all the duplicate rows.
can u guide us to retain one row for each group of duplicates.....
Back to top
View user's profile Send private message
chandrankk
Beginner


Joined: 06 Dec 2005
Posts: 8
Topics: 0

PostPosted: Thu Dec 15, 2005 8:25 am    Post subject: Reply with quote

I am just thinking why a simple DISTINCT will not do in this case

Ofcourse, it will not work if you are considering only a partial set as the key
Back to top
View user's profile Send private message
chandrankk
Beginner


Joined: 06 Dec 2005
Posts: 8
Topics: 0

PostPosted: Fri Dec 16, 2005 5:12 am    Post subject: Reply with quote

Sorry, I understood the question wrongly.
I tested the query by Kolusu and am getting the same result cp is getting. That is, it is deleting all the duplicate rows. I don't know what is wrong here.
Back to top
View user's profile Send private message
a_seshu
Beginner


Joined: 30 Sep 2004
Posts: 16
Topics: 4
Location: Chennai

PostPosted: Mon Dec 19, 2005 4:58 pm    Post subject: Delete is a setlevel operation even with correlation.... Reply with quote

Hi Kolusu,

The reason i think all the duplicate rows get deleted even in the case of a correlated query is that the delete operation performs at a set level.

Taking the same data example

x, y, z
x, y, z
x, y, z
a, b, c
a, b, c
l, m, n

The query you posted in the post 3

Code:
DELETE                                                         
FROM TABLE A                                                     
WHERE (A.COL1,A.COL2,A.COL3) IN (SELECT B.COL1,B.COL2,B.COL3   
                                   FROM TABLE B                       
                                  WHERE A.COL1 = B.COL1             
                                    AND A.COL2 = B.COL2             
                                    AND A.COL3 = B.COL3             
                                  GROUP BY B.COL1, B.COL2, B.COL3   
                                 HAVING COUNT(*) > 1);


looks like it should work fine but when it is being processed,

1) First of all db2 picks up the x,y,z record and the correlated subquery returns that the count is 3.

2) The count is > 1 so db2 decides to delete the rows with col1=x, col2=y, and col3=z.

3) As this delete happens, DB2 deletes at a set level all the rows with col1=x, col2=y and col3=z.

4) So the next record db2 would be pointing to would be a,b,c and no longer the second occurrence of x,y,z as it is already deleted.

5) As the row a,b,c is having more than 1 occurence, it deletes all the rows of col1=a, col2=b, and col3=c at a set level.

6) Finally the row left is l,m,n. this row has a count of 1 and as it is not >1, db2 is not deleting the row and finally we are ending up with only the row l,m,n.

Note : If you insert one more l,m,n to the table and retry the query, it will delete even all the occurrences of l,m,n row also.


Only in case of selects, correlation happens at record level ad DB2 virtually processes at row level (I mean DB2 executes the correlated subquery for every row from the main query).

DB2 allows to alter table and add column so you can try the below solution.

Code:


create table prob1(col1 char(1), col2 char(1), col3 char(1)); 
insert into prob1 values ( 'x', 'y', 'z');                     
insert into prob1 values ( 'x', 'y', 'z');                     
insert into prob1 values ( 'x', 'y', 'z');                     
insert into prob1 values ( 'a', 'b', 'c');                     
insert into prob1 values ( 'a', 'b', 'c');                     
insert into prob1 values ( 'l', 'm', 'n');                     

alter table prob1 add rcount integer;                               
insert into prob1                                                   
select col1, col2, col3, count(*) from prob1 group by col1,col2,col3
 ;                                                                 
select * from prob1;                                               
delete from prob1 where rcount is null;   


After the execution, you will be having only one occurrence of every row but the problem is we can not drop a column in db2 Razz this maddens me. so you will be left with the rcount field for ever in the table. unless you drop and recreate the table.

Technically, the best and the fastest solution is unload, sort with xsum and reload.
_________________
I dont think I would ever stop learning. - Seshu.
Back to top
View user's profile Send private message Send e-mail Yahoo Messenger MSN Messenger
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Database All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group