MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Sort/split of an XML file
Goto page 1, 2  Next
 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Sat Feb 27, 2016 2:24 am    Post subject: Sort/split of an XML file Reply with quote

I don't actually have a need for a sort solution as yet, but the final results will (I'm sure) need it at a later date.

My thought was to explain the problem we have and hopefully, with your help come up with a solution that will work easily using DFSORT.

Okay, background.

We have a program that creates a file (the yearly bank details for all our customers).
The first 50 bytes in this file contains details that are used to sort it and then split it into smaller files.
These details contain information such as zip code, how many pages are needed for each letter etc.
All of this, so that we save money vis-a-vis the post office if the letters are sorted in zip-code order.
In addition, how many pages there are in each letter, determines the post-processing in the automated "kuvertering" (don't know the word in English, but it's about automatically putting the letters into envelopes).

Present day.

The program that creates the first file above has been converted to create an XML file instead (on the mainframe).

Problem.

The XML file (at the moment) obviously doesn't have these 50 bytes in it.
We need a way of facilitating the sort and split of the file.

Obviously, we could pre-pend those 50 bytes to each record in the XML file, but this has the following problems:-

1 The XML file is no longer readable using standard XML s/w (such as XML notepad)
2 The XML file contains x leading records with namespace tags and any split of the file would need to include those first records in each output file.
3 The XML file also contains the closing tag for the "major" tag - this record would need to be appended to each output file (though I imagine any DFSORT solution could include some hard-coded value for this closing tag)

Another option might be to create a second file that only contains the equivalent of the leading 50 bytes for each record and then use DFSORT to merge and split the files based on this second file (somehow or other).

Another option would be to create the XML file as it is at the moment and also create the same file with the 50 bytes pre-pended for each record (the first file is then readable as XML, the second file is used for the sort and split)

Another option would be .......

Any thoughts/suggestions would be gratefully appreciated.
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Sat Feb 27, 2016 3:52 am    Post subject: Reply with quote

Is your XML "horizontal" (one long record) or "vertical" (lots of physical records for one logical record)?

How about just adding the 50 bytes as a new XML element? A bit like using the FILLER at the end of a record (except you'd prefer this one to be first, so that it is in a fixed location) it should/may not affect other processing, as everything should be extracted by name, rather than fixed or relative position.
Back to top
View user's profile Send private message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Sat Feb 27, 2016 7:04 am    Post subject: Can be either Reply with quote

When creating the file, we set a flag to indicate whether the file should be split, one tag per record or created with as many tags as will fit into 256 bytes.

Your idea is interesting. Basically (?) create a dummy tag containing the 50 bye sort information.

DFSORT would then have to a sort the file based on this tag (as well as all tags following it up to the next 50-byte tag). Although writing the sort parms is outside my competence, I would be prepared to Google and experiment on how to do it.

The advantage of your idea is that the XML file is still readable (albeit with an extra "weird" tag).

I'll certainly look into it. Thanx for the idea.
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Sat Feb 27, 2016 7:07 am    Post subject: Reply with quote

BTW. Can you please give me the DFSORT keyword I need to look at so as to split the file based on this tag

(Note, I'm not asking for a solution, only a pointer as to where to start looking)
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Sat Feb 27, 2016 10:33 am    Post subject: Reply with quote

To create multiple files, you use multiple OUTFIL statements. There you can use INCLUDE= or OMIT=, and also for one, SAVE. The INCLUDE= and OMIT= are similar to the INCLUDE/OMIT COND=, except they work on the final output data.
Back to top
View user's profile Send private message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Sat Feb 27, 2016 10:37 am    Post subject: Thanks William, but my bad Reply with quote

I should have asked for the DFSORT keywords to sort the actual file

Googling, I'm guessing I need some sort of combination of

.. IFTHEN=(WHEN=GROUP

or similar
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Sat Feb 27, 2016 11:49 am    Post subject: Reply with quote

Yes, if you have multiple physical records, you're going to need WHEN=GROUP to get the records to sort together by PUSHing the key to a temporary extension. After the SORT, in OUTREC or OUTFIL, you can cut the records back down to the original data with BUILD or with IFOUTLEN if you have further handy IFTHENs.
Back to top
View user's profile Send private message
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Sun Feb 28, 2016 1:53 am    Post subject: Thanks again Reply with quote

I'll try looking at it this coming week. For those who "speak" fluent DFSORT, the solution will probably be pretty easy, but if I get it working, I'll post the solution here anyway
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Mon Feb 29, 2016 7:28 am    Post subject: Okay, already running into what should be a trivial problem Reply with quote

Here's my file (the first few records)
Quote:

<?xml version="1.0" encoding="iso-8859-1"?>
<ROOT>
<START>
<SYSTEM>PEKO</SYSTEM>
<DDNAME>Q236201</DDNAME>
<DATE>2015-01-22</DATE>
<ENVIRONMENT>FT </ENVIRONMENT>
</START>
<SORT_ID><!--58221 000705670200908192922001004168000300
<DOCUMENT>
<CHANNEL>BATCH</CHANNEL>
<DEST/>
<DOCUMENT_DATE>2027-12-22</DOCUMENT_DATE>
<POSTTYP>B</POSTTYP>
<CLNR>6600</CLNR>
<DOCUMENT_TYPE>Ã…RSBESKED </DOCUMENT_TYPE>
<SYSTEMCUSTOM>


The important record is the one starting with <SORT-ID><!-- , this will be placed before every <DOCUMENT> tag which is the start of a new customer.

The input file is VB, 128. I've reviewed the example on page 48 of Smart DFSORT tricks and have created the following JCL

Code:

//IMSDEL   EXEC PGM=IDCAMS                                     
//*                                                             
//SYSPRINT DD SYSOUT=*                                         
//*                                                             
//SYSIN    DD *                                                 
  DELETE MISI01.XML.SORTED                                     
//*                                                             
//S2       EXEC PGM=ICEMAN                                     
//SYSOUT   DD SYSOUT=*                                         
//SORTIN   DD DISP=SHR,DSN=MISI01.XML.UNSORTED                 
//SORTOUT  DD DSN=MISI01.XML.SORTED,DISP=(,CATLG),             
//            RECFM=VB,LRECL=136                               
//SYSIN    DD *                                                 
  INREC IFTHEN=(WHEN=INIT,BUILD=(1,4,5:9X,128:5)),             
    IFTHEN=(WHEN=GROUP,BEGIN=(14,13,CH,EQ,C'<SORT_ID><!--'),   
        PUSH=(5:ID=9))                                         
  SORT FIELDS=(5,9,ZD,A)                                       
/*                                                             


I have deliberately changed the size of the inserted group ID values to 9 rather than 8 so I avoid on the IFTHEN line the string BEGIN(13,13....
(the SORT_ID string is 13 characters and would leave you wondering whether 13 referred to the position or the length of the GROUP delimiter).

Trouble is, when I submit this job, it fails with abend 217 and the nearest I can find to an error message is the following:-

ICE217A 3 170 BYTE VARIABLE RECORD IS LONGER THAN 136 BYTE MAXIMUM FOR SORTOUT
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Mon Feb 29, 2016 8:03 am    Post subject: Reply with quote

Okay, I got the group ID indicators in the file using
Code:

//SYSIN    DD *                                               
  INREC IFTHEN=(WHEN=INIT,BUILD=(1,4,5:9X,14:5)),             
    IFTHEN=(WHEN=GROUP,BEGIN=(14,13,CH,EQ,C'<SORT_ID><!--'), 
        PUSH=(5:ID=9))                                       
  SORT FIELDS=(5,9,ZD,A)                                     
/*                                                           


My trouble now is that I need to sort all the groups based on the record containing the string '<SORT_ID><!--' but not within each group ID - those records should be left as-is

This means, for example, that, if the SORT_ID rows contain
Quote:

<SORT_ID><!--58221 000705670200908192922001004168000300
- - - - - - - - - - - - - - - 294 Line(s) no
<SORT_ID><!--58221 000705670100908192922001004168000300
- - - - - - - - - - - - - - - 410 Line(s) no
<SORT_ID><!--00000 000705670100908192922001004168000300
- - - - - - - - - - - - - - - 428 Line(s) no
<SORT_ID><!--00000 000505670100908192922001004168000300

the GROUPS with 00000 should come first, then the one with 0007056701 and finally the one with 0007056702.

Hope this makes sense.
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Mon Feb 29, 2016 9:43 am    Post subject: Reply with quote

After a lot of experimenting, I arrived at the following solution (this doesn't include the BUILD, but that's the last, simple part).
Code:

//STEP040   EXEC PGM=SORT                                       
//* Start by copying the VB file to an FB one. This based on comments I found
//* at http://www.mvsforums.com/helpboards/viewtopic.php?t=2267&sid=7bb7b5ad2739f86fee68c0ec2b0764a7
//* which seemed to indicate that what I REALLY want to do can't be done
//* with a VB file
//SYSOUT    DD SYSOUT=*                                         
//SORTIN    DD DISP=SHR,DSN=MISI01.XML.UNSORTED                 
//SORTOUT   DD DSN=MISI01.XML.SORTED.FIXED,DISP=(,CATLG)       
//SYSIN     DD  *                                               
  OPTION COPY                                                   
  OUTFIL VTOF,BUILD=(5,124)                                     
/*                                                             
//*                                                             
//S2       EXEC PGM=ICEMAN                                     
//SYSOUT   DD SYSOUT=*                                         
//SORTIN   DD DISP=SHR,DSN=MISI01.XML.SORTED.FIXED             
//SORTOUT  DD DSN=MISI01.XML.SORTED,DISP=(,CATLG)               
//*SORTOUT DD SYSOUT=*                                         
//SYSIN    DD * 
* specifies that the original sequence must be preserved.
  OPTION EQUALS                                                 
* Each bunch of records starting with <SORT_ID><!-- will be grouped
* together, and the whole SORT_ID sequence is placed in column
* 129 of the temp file
  INREC IFTHEN=(WHEN=GROUP,BEGIN=(1,13,CH,EQ,C'<SORT_ID><!--'),
                PUSH=(129:1,87)),                               
* .... however, when we reach the final closing tag (</ROOT>), we append a
* "SORT_ID" of <ZZZZZZZZZZZ  so it gets sorted last of all records
        IFTHEN=(WHEN=(1,7,CH,EQ,C'</ROOT>'),                   
                OVERLAY=(129:C'<ZZZZZZZZZZZZZZZZZZZZZZZZZZ'))   
* Okay, sort based on the sort sequence records
  SORT FIELDS=(129,87,CH,A)                         
* ... and create the output file (WITH the extra sort sequence just for clarity)           
  OUTFIL BUILD=(1,215)                                         
/*                                                             


Is it correct, that I have to copy the VB file to an FB one in order to avoid error ICE218A ???
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Mon Feb 29, 2016 10:09 am    Post subject: Reply with quote

If your input is variable-length, then you should extend at the beginning of the record.

Code:
INREC IFTHEN=(WHEN=INIT,
            BUILD=(1,4, ...the RDW
                   87X, ...blanks to be overlaid by the key
                   5), ...the original data from the variable-length record

Then PUSH at 5:

SORT at 5.

OUTREC BUILD=(1,4, ...the RDW
              91) ...the original data
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Mon Feb 29, 2016 10:15 am    Post subject: Reply with quote

misi01,

As william pointed out you just need to copy the key you want to sort after the RDW.

Assuming your sort key is at position 18 for length of 16 which is 58221 0007056702

All you need is to push that key using WHEN=GROUP and then remove it after sorting.

Quote:

<SORT_ID><!--58221 000705670200908192922001004168000300
- - - - - - - - - - - - - - - 294 Line(s) no
<SORT_ID><!--58221 000705670100908192922001004168000300
- - - - - - - - - - - - - - - 410 Line(s) no
<SORT_ID><!--00000 000705670100908192922001004168000300
- - - - - - - - - - - - - - - 428 Line(s) no
<SORT_ID><!--00000 000505670100908192922001004168000300



Something like this

Code:

//SYSIN    DD *                                                   
  INREC IFTHEN=(WHEN=INIT,BUILD=(1,4,16X,5)),                     
        IFTHEN=(WHEN=GROUP,BEGIN=(21,13,CH,EQ,C'<SORT_ID><!--'), 
          PUSH=(5:34,16))                                         
                                                                 
  SORT FIELDS=(5,16,CH,A),EQUALS                                 
                                                                 
  OUTREC BUILD=(1,4,21)                                           
/*


It is as simple as that.
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Mon Feb 29, 2016 10:51 am    Post subject: Reply with quote

William, Kolusu. I think I've understood what you meant. Here's my code (I've added comments so there should be no doubt as to what I think I'm doing)
Code:

  OPTION EQUALS     
* Insert 87 blanks immediately after the RDW                                             
  INREC IFTHEN=(WHEN=INIT,BUILD=(1,4,87X,5)),                     
* For each change of SORT_ID, "overlay" the SORT_ID in positions 5 (just after
* the RDW)
        IFTHEN=(WHEN=GROUP,BEGIN=(5,13,CH,EQ,C'<SORT_ID><!--'),   
                PUSH=(5:5,87)),                                   
* Final record should result in <ZZZZZZZ being "overlaid"  in pos 5
        IFTHEN=(WHEN=(5,7,CH,EQ,C'</ROOT>'),                     
                OVERLAY=(5:C'<ZZZZZZZZZZZZZZZZZZZZZZZZZZ'))
* Sort 87 bytes starting just after the RDW       
  SORT FIELDS=(5,87,CH,A)   
* Build the record, omitting the temp SORT_ID keys                                     
  OUTREC BUILD=(1,4,92)                                           


Trouble is, when I run this, I don't see the file being sorted based on the SORT_ID fields.

Kolusu - in your example, did you really mean
Code:

  PUSH=(5:34,16))     

rather than
Code:

  PUSH=(5:18,16))     Your example has the key field in position 18 

_________________
Michael
Back to top
View user's profile Send private message Send e-mail
misi01
Advanced


Joined: 02 Dec 2002
Posts: 629
Topics: 176
Location: Stockholm, Sweden

PostPosted: Mon Feb 29, 2016 10:56 am    Post subject: Reply with quote

Okay, got it !!!!
Code:

  OPTION EQUALS                                                   
  INREC IFTHEN=(WHEN=INIT,BUILD=(1,4,87X,5)),                     
        IFTHEN=(WHEN=GROUP,BEGIN=(92,13,CH,EQ,C'<SORT_ID><!--'),   
                PUSH=(5:92,87)),                                   
        IFTHEN=(WHEN=(92,7,CH,EQ,C'</ROOT>'),                     
                OVERLAY=(5:C'<ZZZZZZZZZZZZZZZZZZZZZZZZZZ'))       
  SORT FIELDS=(5,87,CH,A)                                         


It didn't occur to me that any references to positions in the IFTHEN had to take into account that 87 bytes had been "inserted" into the file before what I thought were the positions (so pos 5 - with the RDW - is now 5+87 = 92)

As you can see from the example above, I didn't include the OUTREC statement. I like to add that last, simply so I can see whether my first statements do what I expect/want them to.

Thanks to both of you.
_________________
Michael
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group