MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

XOR Operation on the records of a file.
Goto page 1, 2  Next
 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Aug 04, 2004 1:51 am    Post subject: XOR Operation on the records of a file. Reply with quote

Can the following be accomplished in an efficient manner using some utilities (SORT or EZTRIEVE or COBOL or ASSEMBLER) ?

Code:

Input File:
ABCDEF12345
123DEEF92030
40VKKKKKKKK
4RBIIOOKKPPP
....


I need to pick up 2 records at a time from the input file and perform XOR between the 2 records (Records 1 & 2, 3 & 4, 5 & 6.....till end of file) and store the output in another file.

The requirement is to compare two - two records from the file and highlight the point where the character changes. If the character at a particular column is same in both the records then the XOR operation will result in a Binary Zero (Low-value) at that point and if the character differs, it will output some junk value.

I did this using REXX (BITXOR Operand), but my input file is very huge (millions of records). So is there a better alternative to achieve this.

Thanks for your support and guidance,
Phantom
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12372
Topics: 75
Location: San Jose

PostPosted: Wed Aug 04, 2004 5:49 am    Post subject: Reply with quote

Phantom,

If your intention is to compare byte by byte, then you can use the logic shown in this link. Just split the file into 2 files using the SPLIT parm of sort and then compare byte by byte as shown here

http://www.mvsforums.com/helpboards/viewtopic.php?t=1436&highlight=compare


Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Aug 04, 2004 10:02 am    Post subject: Reply with quote

Thanks a lot kolusu, I will try the solution and get back to you. Meanwhile, I'm explaining my full requirement below. Could you please guide me a better way to achieve the end result that I'm looking for.

The file (that I was referrring to in my earlier post) is nothing but output of SORT (I'm comparing two files using SORT). I want to create a third record below each matching insert-deletes which will highlight the position of change.

For Example:
This is the output of my Main Sort (which compares the two files). The records with start with '1-' in position 1 is picked up from first file and '2-' from second file.

Code:

1- ABCD12345
2- CBCE11344
1- ZZYXYSY34
2- XYY2XYX33
...


Instead of just leaving the data as it is I want my output like the one shown below.
Code:

1- ABCD12345
2- CBCE11344
D- *  * *  *
1- ZZYXYSY34
2- XYY2XYX33
D- ** **** *
....


This is my actual requirement. The problem I face here is the comparison output from sort doesn't only have matching inserts-deletes but it also has new line inserts and old line deletions (i.e set of 1- records and set of 2- records separately). In this case I cant use the SPLIT parm of sort.

Thanks a bunch,
Phantom
Back to top
View user's profile Send private message
NutCracker
Beginner


Joined: 13 Dec 2002
Posts: 45
Topics: 3
Location: 3rd Block from the SUN

PostPosted: Wed Aug 04, 2004 10:51 am    Post subject: Reply with quote

Why don't you use File-Aid Compare instead of the second sort step?
This compare utility gives you the output in the required format. Try it out & let us know.
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Aug 04, 2004 10:58 am    Post subject: Reply with quote

Nutcracker. Unfortunately we donot have fileaid here. Fileaid has been replaced by Insync in our shop. The same thing might be done in Insync also, But I don't want my solution to depend on these kind of tools !!!. (In our shop, these kind of tools are replaced from time to time)...

Thanks,
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Aug 04, 2004 11:02 am    Post subject: Reply with quote

Kolusu,

I tried the Eztrieve compare solution for two files of LRECL=541 (RECFM=FB) each containing 1,211,244 records. It took same amount of CPU time & JOB time to complete that my REXX solution took. Moreover for the Eztrieve solution, I had to use SORT to split the input file into two parts which consumed some additional CPU & Job time.

Is there a better and efficient to achieve the result.

Thanks in advance,
Phantom
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12372
Topics: 75
Location: San Jose

PostPosted: Wed Aug 04, 2004 11:46 am    Post subject: Reply with quote

Phantom,

Code:

I tried the Eztrieve compare solution for two files of LRECL=541 (RECFM=FB) each containing 1,211,244 records


Wow ! you mean to say that your intrepreted rexx ran for the same time as easytrieve?

hmm can i see your code?

Code:

Is there a better and efficient to achieve the result.


I wouldn't consider sort to do the comparision and create an output with all the changed lines and then once again compare each and every byte to see if something is changed.

I would write 1 single eztrieve/cobol pgm to compare and write out the output files in the desired format.

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Thu Aug 05, 2004 6:19 am    Post subject: Reply with quote

Kolusu,

Code:

Wow ! you mean to say that your intrepreted rexx ran for the same time as easytrieve?

hmm can i see your code?


Sure. Here is the code. This reads two records at a time and writes the compared output as a batch of 1000 records.
Code:

/*REXX*/                                                     
RETURN_CODE=RC                                               
COUNT = 0                                                   
DROP RES.                                                   
RES.0 = 0                                                   
                                                             
 DO UNTIL RETURN_CODE = 2                                   
  "EXECIO 2 DISKR IN (OPEN STEM DATA."                       
   RETURN_CODE = RC                                         
                                                             
   COUNT     = COUNT + 1                                     
   OUTDATA   = BITXOR(DATA.1, DATA.2)                       
   OUTDATA   = TRANSLATE(OUTDATA, '*', XRANGE('01'X,'FF'X)) 
   RES.COUNT = TRANSLATE(OUTDATA, ' ', XRANGE('00'X))       
   RES.0     = COUNT                                         

   IF COUNT  = 1000 THEN                                     
    DO                                                       
        "EXECIO 1000 DISKW OUT (STEM RES."   
         COUNT = 0                           
         DROP RES.                           
         RES.0 = 0                           
         IF RETURN_CODE = 2 THEN             
          DO                                 
              "EXECIO 0 DISKW OUT (FINIS"   
               EXIT 0                       
          END                               
    END                                     
 END                                         
"EXECIO "COUNT" DISKW OUT (STEM RES. FINIS" 


And here is the Spool output for REXX
Code:

04.31.16 JOB15352  ACT001I    STEPNAME PROCSTEP PROGRAM   CPU-TIME    COND
04.31.16 JOB15352  ACT002I    STEP1             IEFBR14  00:00:00.00  0000
04.33.11 JOB15352  ACT002I    TSOBAT1           IKJEFT1A 00:01:11.50  0000

...
ELAPSED TIME               00:01:54.97


And here is the JES output for Easytrieve solution.
Code:

11.40.37 JOB19093  ACT002I    R020              EZTPA00  00:01:16.80  0000

ELAPSED TIME               00:01:57.54


Code:

I would write 1 single eztrieve/cobol pgm to compare and write out the output files in the desired format.


I can do that, but the I/O operations in COBOL / EZTRIEVE are not that efficient as that of sort right ! Instead of processing the entire mass of input records from two files, I thought I could write a sort routine to extract only the changed lines (which in most cases will not be as much as the full input) and then process them using some other means.

I have one question regarding sort Exits. When we use Sort Exits, who will do the Record Fetch operation (I/O) ? The sort or the routine that is being invoked in the exit. If I can reduce the Excps and subsequently the CPU time that would be very helpful.

Thanks,
Phantom
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12372
Topics: 75
Location: San Jose

PostPosted: Thu Aug 05, 2004 7:23 am    Post subject: Reply with quote

Phantom,

I wanted to see your easytrieve program. I am not an expert in REXX.

Quote:

I can do that, but the I/O operations in COBOL / EZTRIEVE are not that efficient as that of sort right ! Instead of processing the entire mass of input records from two files, I thought I could write a sort routine to extract only the changed lines (which in most cases will not be as much as the full input) and then process them using some other means.


Even though sort is a highly optimized I/O routine, Easytrieve scores for matching files. Easytrieve can also handle duplicates with one single pass of the data, whereas with sort you need atleast 3 passes.

However if you omit certain parameters in easytrieve , the job will run like a snail. So post your easytrieve code and also the file stastics from the sysprint from the easytrieve run.

Quote:

I have one question regarding sort Exits. When we use Sort Exits, who will do the Record Fetch operation (I/O) ? The sort or the routine that is being invoked in the exit. If I can reduce the Excps and subsequently the CPU time that would be very helpful.


Check this link for DFSORT'S program phases.

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICE1CA00/4.2?DT=20031124143823

I am not a big fan of exits in SORT. Generally speaking Exits have a bearing on performance. It's more efficient to NOT use exits at all, if possible.

Sometimes you can use a COPY rather than a sort. Sorting affects performance. A copy operation is much faster than sort operation.For example with an E15 exit, you could use a copy which would most likely be faster than using a sort without an exit, unless the number of records was small.

Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Thu Aug 05, 2004 9:13 am    Post subject: Reply with quote

Kolusu,

Here is my Eazytrieve code. I just copied your code and replaced 3000 to 541 since my LRECL is just 541.

Code:

//SYSIN    DD *                                                         
                                                                       
  FILE INFILE1                                                         
       IN-REC1          01 01  A OCCURS 541                             
                                                                       
  FILE INFILE2                                                         
       IN-REC2          01 01  A OCCURS 541                             
                                                                       
                                                                       
  FILE OUTPUT  FB(0 0)                                                 
       OUT-REC          01 01  A OCCURS 541                             
                                                                       
  W-SUB                 W 04 N                                         
  S-WRITE-OUTPUT        W 01 A                                         
                                                                       
***********************************************************************
* MAINLINE                                                            *
***********************************************************************
                                                                       
JOB INPUT NULL                                                         
                                                                       
   GET INFILE1                                                         
   GET INFILE2                                                         
                                                                       
   DO WHILE INFILE2                                                     
      S-WRITE-OUTPUT  = 'N'                                             
      W-SUB           = 1                                               
                                                                       
      DO UNTIL W-SUB GT 541                                             
         IF IN-REC1 (W-SUB) = IN-REC2 (W-SUB)                           
            OUT-REC (W-SUB) = ' '                                       
         ELSE                                                           
            S-WRITE-OUTPUT  = 'Y'                                       
            OUT-REC(W-SUB)  = '*'                                       
         END-IF                                                         
         W-SUB              = W-SUB + 1                                 
      END-DO                                                           
                                                                       
      IF S-WRITE-OUTPUT     = 'Y'                                       
         PUT OUTPUT                                                     
      END-IF                                                           
                                                               
      GET INFILE1                                               
      GET INFILE2                                               
   END-DO                                                       
                                                               
   STOP                                                         


Sysprint Message:
Code:

OPTIONS FOR THIS RUN - ABEXIT NO  DEBUG (STATE FLDCHK NOXREF)  LIST (PARM FILE)   PRESIZE   512     
SORT (DEVICE SYSDA  ALTSEQ NO  MSG DEFAULT  MEMORY MAX   WORK   3)  VFM (   64)
 8/04/04 11.38.41                             CA-EASYTRIEVE PLUS-6.4 0202         PAGE    2         
                                                       PERSHING                 
PROGRAMS AND ALL SUPPORTING MATERIALS COPYRIGHT (C) 1982, 1996 BY COMPUTER ASSOCIATES INTL. INC.
FILE STATISTICS - CA-EASYTRIEVE PLUS 6.4 0202- 8/04/04-11.38-JSN00012           
INFILE1       1,211,245    INPUT        SAM  FIX   BLK                  541    32460
INFILE2       1,211,244    INPUT        SAM  FIX   BLK                  541    32460
OUTPUT        1,211,244   OUTPUT        SAM  FIX   BLK                  541    32460


Please review the code and let me know if anything could be modified so that it can run faster.

Code:

Easytrieve can also handle duplicates with one single pass of the data, whereas with sort you need atleast 3 passes.


Can you please provide me the eazytrieve code which can handle duplicates in the input file. I'm just trying to write a tool which is better and efficient compared to ISRSUPC. The ISRSUPC compare ran for more than 47 minutes (Job time) to compare two files each of 2.5 million records. Please let me know if my requirement is not clear.

Thanks a ton,

Thanks,
Phantom
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Thu Aug 05, 2004 9:49 am    Post subject: Reply with quote

Kolusu said
Quote:
I am not a big fan of exits in SORT. Generally speaking Exits have a bearing on performance. It's more efficient to NOT use exits at all, if possible.


Although it's true that exits can have an impact on performance, exits can be more efficient in some cases such as when using an exit allows only one pass over the data, whereas not using an exit requires more than pass over the data.

Phantom said
Quote:
I have one question regarding sort Exits. When we use Sort Exits, who will do the Record Fetch operation (I/O) ? The sort or the routine that is being invoked in the exit. If I can reduce the Excps and subsequently the CPU time that would be very helpful.


With an Assembler exit, DFSORT does the I/O (unless YOU choose to have the exit handle all the I/O). With a COBOL exit, DFSORT does the I/O if COBOL sets FASTSRT in effect, or COBOL does the I/O if COBOL sets NOFASTSRT in effect.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Thu Aug 05, 2004 9:56 am    Post subject: Reply with quote

Thanks for the clarification Frank,

Phantom,
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12372
Topics: 75
Location: San Jose

PostPosted: Thu Aug 05, 2004 10:03 am    Post subject: Reply with quote

Phantom,

Try this code. The code uses an index instead of substring. I ran this code and I noticed the run time was reduced by 50%. I ran it against 4 million file.


Code:

FILE INFILE1                                           
     IN-REC1          01 01  A OCCURS 541               
                                                       
FILE INFILE2                                           
     IN-REC2          01 01  A OCCURS 541 INDEX IDX2   
                                                       
                                                       
FILE OUTPUT  FB(0 0)                                   
     OUT-REC          01 01  A OCCURS 541               
                                                       
S-WRITE-OUTPUT        W 01 A                           

*********************************************************
* MAINLINE                                              *
*********************************************************
                                                         
JOB INPUT NULL                                           
                                                         
   GET INFILE1                                           
   GET INFILE2                                           
                                                         
   DO WHILE INFILE2                                     
      S-WRITE-OUTPUT  = 'N'                             
      IDX2            = 1                               
                                                         
      DO UNTIL IDX2  GT 541                             
         IF IN-REC1 (IDX2) = IN-REC2 (IDX2)             
            OUT-REC (IDX2) = ' '                         
         ELSE                                           
            S-WRITE-OUTPUT  = 'Y'                       
            OUT-REC(IDX2)   = '*'                       
         END-IF                                         
         IDX2               = IDX2 + 1                   
      END-DO                                             
                                                       
                                       
      IF S-WRITE-OUTPUT     = 'Y'       
         PUT OUTPUT                     
      END-IF                           
                                       
      GET INFILE1                       
      GET INFILE2                       
   END-DO                               
                                       
   STOP                                 


Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Thu Aug 05, 2004 10:26 am    Post subject: Reply with quote

Wow, Kolusu,

This solution is great. The time is reduced by 50 % just as you said. This is my stats:

Code:

11.21.19 JOB03828  ACT001I    STEPNAME PROCSTEP PROGRAM   CPU-TIME    COND
11.21.19 JOB03828  ACT002I    R020              EZTPA00  00:00:48.50  0000

...
ELAPSED TIME               00:01:24.51


Thanks for the efforts kolusu,

I will try to modify this code to handle duplicates in the input file. Lets see how far I can do that !!!.

Phantom
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Oct 06, 2004 12:27 am    Post subject: Reply with quote

Quote:

With an Assembler exit, DFSORT does the I/O (unless YOU choose to have the exit handle all the I/O). With a COBOL exit, DFSORT does the I/O if COBOL sets FASTSRT in effect, or COBOL does the I/O if COBOL sets NOFASTSRT in effect.


Frank/Kolusu,
Could you please provide me a simple example of SORT 35-Exit calling a COBOL Program (Using FASTSRT). i.e the I/O should be handled by SORT. This would be really helpful for me.

Thanks a Ton in advance,
Phantom
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group