View previous topic :: View next topic |
Author |
Message |
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Wed Aug 04, 2004 1:51 am Post subject: XOR Operation on the records of a file. |
|
|
Can the following be accomplished in an efficient manner using some utilities (SORT or EZTRIEVE or COBOL or ASSEMBLER) ?
Code: |
Input File:
ABCDEF12345
123DEEF92030
40VKKKKKKKK
4RBIIOOKKPPP
....
|
I need to pick up 2 records at a time from the input file and perform XOR between the 2 records (Records 1 & 2, 3 & 4, 5 & 6.....till end of file) and store the output in another file.
The requirement is to compare two - two records from the file and highlight the point where the character changes. If the character at a particular column is same in both the records then the XOR operation will result in a Binary Zero (Low-value) at that point and if the character differs, it will output some junk value.
I did this using REXX (BITXOR Operand), but my input file is very huge (millions of records). So is there a better alternative to achieve this.
Thanks for your support and guidance,
Phantom |
|
Back to top |
|
|
kolusu Site Admin
Joined: 26 Nov 2002 Posts: 12372 Topics: 75 Location: San Jose
|
|
Back to top |
|
|
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Wed Aug 04, 2004 10:02 am Post subject: |
|
|
Thanks a lot kolusu, I will try the solution and get back to you. Meanwhile, I'm explaining my full requirement below. Could you please guide me a better way to achieve the end result that I'm looking for.
The file (that I was referrring to in my earlier post) is nothing but output of SORT (I'm comparing two files using SORT). I want to create a third record below each matching insert-deletes which will highlight the position of change.
For Example:
This is the output of my Main Sort (which compares the two files). The records with start with '1-' in position 1 is picked up from first file and '2-' from second file.
Code: |
1- ABCD12345
2- CBCE11344
1- ZZYXYSY34
2- XYY2XYX33
...
|
Instead of just leaving the data as it is I want my output like the one shown below.
Code: |
1- ABCD12345
2- CBCE11344
D- * * * *
1- ZZYXYSY34
2- XYY2XYX33
D- ** **** *
....
|
This is my actual requirement. The problem I face here is the comparison output from sort doesn't only have matching inserts-deletes but it also has new line inserts and old line deletions (i.e set of 1- records and set of 2- records separately). In this case I cant use the SPLIT parm of sort.
Thanks a bunch,
Phantom |
|
Back to top |
|
|
NutCracker Beginner
Joined: 13 Dec 2002 Posts: 45 Topics: 3 Location: 3rd Block from the SUN
|
Posted: Wed Aug 04, 2004 10:51 am Post subject: |
|
|
Why don't you use File-Aid Compare instead of the second sort step?
This compare utility gives you the output in the required format. Try it out & let us know. |
|
Back to top |
|
|
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Wed Aug 04, 2004 10:58 am Post subject: |
|
|
Nutcracker. Unfortunately we donot have fileaid here. Fileaid has been replaced by Insync in our shop. The same thing might be done in Insync also, But I don't want my solution to depend on these kind of tools !!!. (In our shop, these kind of tools are replaced from time to time)...
Thanks, |
|
Back to top |
|
|
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Wed Aug 04, 2004 11:02 am Post subject: |
|
|
Kolusu,
I tried the Eztrieve compare solution for two files of LRECL=541 (RECFM=FB) each containing 1,211,244 records. It took same amount of CPU time & JOB time to complete that my REXX solution took. Moreover for the Eztrieve solution, I had to use SORT to split the input file into two parts which consumed some additional CPU & Job time.
Is there a better and efficient to achieve the result.
Thanks in advance,
Phantom |
|
Back to top |
|
|
kolusu Site Admin
Joined: 26 Nov 2002 Posts: 12372 Topics: 75 Location: San Jose
|
Posted: Wed Aug 04, 2004 11:46 am Post subject: |
|
|
Phantom,
Code: |
I tried the Eztrieve compare solution for two files of LRECL=541 (RECFM=FB) each containing 1,211,244 records
|
Wow ! you mean to say that your intrepreted rexx ran for the same time as easytrieve?
hmm can i see your code?
Code: |
Is there a better and efficient to achieve the result.
|
I wouldn't consider sort to do the comparision and create an output with all the changed lines and then once again compare each and every byte to see if something is changed.
I would write 1 single eztrieve/cobol pgm to compare and write out the output files in the desired format.
Kolusu _________________ Kolusu
www.linkedin.com/in/kolusu |
|
Back to top |
|
|
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Thu Aug 05, 2004 6:19 am Post subject: |
|
|
Kolusu,
Code: |
Wow ! you mean to say that your intrepreted rexx ran for the same time as easytrieve?
hmm can i see your code?
|
Sure. Here is the code. This reads two records at a time and writes the compared output as a batch of 1000 records.
Code: |
/*REXX*/
RETURN_CODE=RC
COUNT = 0
DROP RES.
RES.0 = 0
DO UNTIL RETURN_CODE = 2
"EXECIO 2 DISKR IN (OPEN STEM DATA."
RETURN_CODE = RC
COUNT = COUNT + 1
OUTDATA = BITXOR(DATA.1, DATA.2)
OUTDATA = TRANSLATE(OUTDATA, '*', XRANGE('01'X,'FF'X))
RES.COUNT = TRANSLATE(OUTDATA, ' ', XRANGE('00'X))
RES.0 = COUNT
IF COUNT = 1000 THEN
DO
"EXECIO 1000 DISKW OUT (STEM RES."
COUNT = 0
DROP RES.
RES.0 = 0
IF RETURN_CODE = 2 THEN
DO
"EXECIO 0 DISKW OUT (FINIS"
EXIT 0
END
END
END
"EXECIO "COUNT" DISKW OUT (STEM RES. FINIS"
|
And here is the Spool output for REXX
Code: |
04.31.16 JOB15352 ACT001I STEPNAME PROCSTEP PROGRAM CPU-TIME COND
04.31.16 JOB15352 ACT002I STEP1 IEFBR14 00:00:00.00 0000
04.33.11 JOB15352 ACT002I TSOBAT1 IKJEFT1A 00:01:11.50 0000
...
ELAPSED TIME 00:01:54.97
|
And here is the JES output for Easytrieve solution.
Code: |
11.40.37 JOB19093 ACT002I R020 EZTPA00 00:01:16.80 0000
ELAPSED TIME 00:01:57.54
|
Code: |
I would write 1 single eztrieve/cobol pgm to compare and write out the output files in the desired format.
|
I can do that, but the I/O operations in COBOL / EZTRIEVE are not that efficient as that of sort right ! Instead of processing the entire mass of input records from two files, I thought I could write a sort routine to extract only the changed lines (which in most cases will not be as much as the full input) and then process them using some other means.
I have one question regarding sort Exits. When we use Sort Exits, who will do the Record Fetch operation (I/O) ? The sort or the routine that is being invoked in the exit. If I can reduce the Excps and subsequently the CPU time that would be very helpful.
Thanks,
Phantom |
|
Back to top |
|
|
kolusu Site Admin
Joined: 26 Nov 2002 Posts: 12372 Topics: 75 Location: San Jose
|
Posted: Thu Aug 05, 2004 7:23 am Post subject: |
|
|
Phantom,
I wanted to see your easytrieve program. I am not an expert in REXX.
Quote: |
I can do that, but the I/O operations in COBOL / EZTRIEVE are not that efficient as that of sort right ! Instead of processing the entire mass of input records from two files, I thought I could write a sort routine to extract only the changed lines (which in most cases will not be as much as the full input) and then process them using some other means.
|
Even though sort is a highly optimized I/O routine, Easytrieve scores for matching files. Easytrieve can also handle duplicates with one single pass of the data, whereas with sort you need atleast 3 passes.
However if you omit certain parameters in easytrieve , the job will run like a snail. So post your easytrieve code and also the file stastics from the sysprint from the easytrieve run.
Quote: |
I have one question regarding sort Exits. When we use Sort Exits, who will do the Record Fetch operation (I/O) ? The sort or the routine that is being invoked in the exit. If I can reduce the Excps and subsequently the CPU time that would be very helpful.
|
Check this link for DFSORT'S program phases.
http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICE1CA00/4.2?DT=20031124143823
I am not a big fan of exits in SORT. Generally speaking Exits have a bearing on performance. It's more efficient to NOT use exits at all, if possible.
Sometimes you can use a COPY rather than a sort. Sorting affects performance. A copy operation is much faster than sort operation.For example with an E15 exit, you could use a copy which would most likely be faster than using a sort without an exit, unless the number of records was small.
Hope this helps...
Cheers
Kolusu _________________ Kolusu
www.linkedin.com/in/kolusu |
|
Back to top |
|
|
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Thu Aug 05, 2004 9:13 am Post subject: |
|
|
Kolusu,
Here is my Eazytrieve code. I just copied your code and replaced 3000 to 541 since my LRECL is just 541.
Code: |
//SYSIN DD *
FILE INFILE1
IN-REC1 01 01 A OCCURS 541
FILE INFILE2
IN-REC2 01 01 A OCCURS 541
FILE OUTPUT FB(0 0)
OUT-REC 01 01 A OCCURS 541
W-SUB W 04 N
S-WRITE-OUTPUT W 01 A
***********************************************************************
* MAINLINE *
***********************************************************************
JOB INPUT NULL
GET INFILE1
GET INFILE2
DO WHILE INFILE2
S-WRITE-OUTPUT = 'N'
W-SUB = 1
DO UNTIL W-SUB GT 541
IF IN-REC1 (W-SUB) = IN-REC2 (W-SUB)
OUT-REC (W-SUB) = ' '
ELSE
S-WRITE-OUTPUT = 'Y'
OUT-REC(W-SUB) = '*'
END-IF
W-SUB = W-SUB + 1
END-DO
IF S-WRITE-OUTPUT = 'Y'
PUT OUTPUT
END-IF
GET INFILE1
GET INFILE2
END-DO
STOP
|
Sysprint Message:
Code: |
OPTIONS FOR THIS RUN - ABEXIT NO DEBUG (STATE FLDCHK NOXREF) LIST (PARM FILE) PRESIZE 512
SORT (DEVICE SYSDA ALTSEQ NO MSG DEFAULT MEMORY MAX WORK 3) VFM ( 64)
8/04/04 11.38.41 CA-EASYTRIEVE PLUS-6.4 0202 PAGE 2
PERSHING
PROGRAMS AND ALL SUPPORTING MATERIALS COPYRIGHT (C) 1982, 1996 BY COMPUTER ASSOCIATES INTL. INC.
FILE STATISTICS - CA-EASYTRIEVE PLUS 6.4 0202- 8/04/04-11.38-JSN00012
INFILE1 1,211,245 INPUT SAM FIX BLK 541 32460
INFILE2 1,211,244 INPUT SAM FIX BLK 541 32460
OUTPUT 1,211,244 OUTPUT SAM FIX BLK 541 32460
|
Please review the code and let me know if anything could be modified so that it can run faster.
Code: |
Easytrieve can also handle duplicates with one single pass of the data, whereas with sort you need atleast 3 passes.
|
Can you please provide me the eazytrieve code which can handle duplicates in the input file. I'm just trying to write a tool which is better and efficient compared to ISRSUPC. The ISRSUPC compare ran for more than 47 minutes (Job time) to compare two files each of 2.5 million records. Please let me know if my requirement is not clear.
Thanks a ton,
Thanks,
Phantom |
|
Back to top |
|
|
Frank Yaeger Sort Forum Moderator
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
|
Posted: Thu Aug 05, 2004 9:49 am Post subject: |
|
|
Kolusu said Quote: | I am not a big fan of exits in SORT. Generally speaking Exits have a bearing on performance. It's more efficient to NOT use exits at all, if possible. |
Although it's true that exits can have an impact on performance, exits can be more efficient in some cases such as when using an exit allows only one pass over the data, whereas not using an exit requires more than pass over the data.
Phantom said Quote: | I have one question regarding sort Exits. When we use Sort Exits, who will do the Record Fetch operation (I/O) ? The sort or the routine that is being invoked in the exit. If I can reduce the Excps and subsequently the CPU time that would be very helpful. |
With an Assembler exit, DFSORT does the I/O (unless YOU choose to have the exit handle all the I/O). With a COBOL exit, DFSORT does the I/O if COBOL sets FASTSRT in effect, or COBOL does the I/O if COBOL sets NOFASTSRT in effect. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort |
|
Back to top |
|
|
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Thu Aug 05, 2004 9:56 am Post subject: |
|
|
Thanks for the clarification Frank,
Phantom, |
|
Back to top |
|
|
kolusu Site Admin
Joined: 26 Nov 2002 Posts: 12372 Topics: 75 Location: San Jose
|
Posted: Thu Aug 05, 2004 10:03 am Post subject: |
|
|
Phantom,
Try this code. The code uses an index instead of substring. I ran this code and I noticed the run time was reduced by 50%. I ran it against 4 million file.
Code: |
FILE INFILE1
IN-REC1 01 01 A OCCURS 541
FILE INFILE2
IN-REC2 01 01 A OCCURS 541 INDEX IDX2
FILE OUTPUT FB(0 0)
OUT-REC 01 01 A OCCURS 541
S-WRITE-OUTPUT W 01 A
*********************************************************
* MAINLINE *
*********************************************************
JOB INPUT NULL
GET INFILE1
GET INFILE2
DO WHILE INFILE2
S-WRITE-OUTPUT = 'N'
IDX2 = 1
DO UNTIL IDX2 GT 541
IF IN-REC1 (IDX2) = IN-REC2 (IDX2)
OUT-REC (IDX2) = ' '
ELSE
S-WRITE-OUTPUT = 'Y'
OUT-REC(IDX2) = '*'
END-IF
IDX2 = IDX2 + 1
END-DO
IF S-WRITE-OUTPUT = 'Y'
PUT OUTPUT
END-IF
GET INFILE1
GET INFILE2
END-DO
STOP
|
Hope this helps...
Cheers
Kolusu _________________ Kolusu
www.linkedin.com/in/kolusu |
|
Back to top |
|
|
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Thu Aug 05, 2004 10:26 am Post subject: |
|
|
Wow, Kolusu,
This solution is great. The time is reduced by 50 % just as you said. This is my stats:
Code: |
11.21.19 JOB03828 ACT001I STEPNAME PROCSTEP PROGRAM CPU-TIME COND
11.21.19 JOB03828 ACT002I R020 EZTPA00 00:00:48.50 0000
...
ELAPSED TIME 00:01:24.51
|
Thanks for the efforts kolusu,
I will try to modify this code to handle duplicates in the input file. Lets see how far I can do that !!!.
Phantom |
|
Back to top |
|
|
Phantom Data Mgmt Moderator
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
|
Posted: Wed Oct 06, 2004 12:27 am Post subject: |
|
|
Quote: |
With an Assembler exit, DFSORT does the I/O (unless YOU choose to have the exit handle all the I/O). With a COBOL exit, DFSORT does the I/O if COBOL sets FASTSRT in effect, or COBOL does the I/O if COBOL sets NOFASTSRT in effect.
|
Frank/Kolusu,
Could you please provide me a simple example of SORT 35-Exit calling a COBOL Program (Using FASTSRT). i.e the I/O should be handled by SORT. This would be really helpful for me.
Thanks a Ton in advance,
Phantom |
|
Back to top |
|
|
|
|