Posted: Fri Sep 12, 2008 10:11 am Post subject: Why SUPERCE does not show correct results - Pls help.
Hi,
I have an input dataset with 23769 records in it. The length of this file is 10. All 10 characters becomes key in 1st file. I have another dataset with 9711 records in it with 10 bytes key field starting in column 73. I have compared these two using JOINKEYS feature of SORT as shown below:
The job report shown that 474 matches were found which was correct. Now, to do a cross verification, I ran SUPERCE (3.13) on these two datasets with the column compare facility as shown below:
Code:
Enter or change Process Statements in the EDIT window below:
****** ***************************** Top of Data **************
000001 CMPCOLMN 1:10
000002 CMPCOLMO 73:82
****** **************************** Bottom of Data ************
Surprisingly, it shown me only 405 matches between these two files Even if I consider "FALSE MATCHE(S) CORRECTED" into consideration also it will come to 477.
The final statistics were as given below:
Code:
LINE COMPARE SUMMARY AND STATISTICS
405 NUMBER OF LINE MATCHES 30381 TOTAL CHANGES (PAIRED+NONPAIRED CHNG)
0 REFORMATTED LINES 2289 PAIRED CHANGES (REFM+PAIRED INS/DEL)
23364 NEW FILE LINE INSERTIONS 21075 NON-PAIRED INSERTS
9306 OLD FILE LINE DELETIONS 7017 NON-PAIRED DELETES
23769 NEW FILE LINES PROCESSED
9711 OLD FILE LINES PROCESSED
72 FALSE MATCH(S) CORRECTED
I would like to know whether Syncsort was wrong or SUPERCE did not work properly Which one to believe? Do I need to run more tests on this data?
Please help me in this regard. Also, I beg you to show me the other ways of comparing. I mean using FILE-AID or Syncsort in a different way.
Thanks for your time. _________________ MF
==
Any training that does not include the emotions, mind and body is incomplete; knowledge fades without feeling.
==
Joined: 02 Dec 2002 Posts: 700 Topics: 63 Location: USA
Posted: Fri Sep 12, 2008 7:53 pm Post subject:
I have never used JOINKEYS but after going over old posts it looks like JOINKEYS is supposed to give caretesian products.
If there are 2 matches in these two files and one of them is repeated twice in second file then I think the sort will show three joins whereas superce will show two matches.
Joined: 02 Dec 2002 Posts: 700 Topics: 63 Location: USA
Posted: Fri Sep 19, 2008 1:22 pm Post subject:
I meant to use following sort instead of the original one to see if the numbers change. That would confirm if duplicates are causing any problem or not.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum