Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Sat Feb 27, 2016 2:24 am Post subject: Sort/split of an XML file
I don't actually have a need for a sort solution as yet, but the final results will (I'm sure) need it at a later date.
My thought was to explain the problem we have and hopefully, with your help come up with a solution that will work easily using DFSORT.
Okay, background.
We have a program that creates a file (the yearly bank details for all our customers).
The first 50 bytes in this file contains details that are used to sort it and then split it into smaller files.
These details contain information such as zip code, how many pages are needed for each letter etc.
All of this, so that we save money vis-a-vis the post office if the letters are sorted in zip-code order.
In addition, how many pages there are in each letter, determines the post-processing in the automated "kuvertering" (don't know the word in English, but it's about automatically putting the letters into envelopes).
Present day.
The program that creates the first file above has been converted to create an XML file instead (on the mainframe).
Problem.
The XML file (at the moment) obviously doesn't have these 50 bytes in it.
We need a way of facilitating the sort and split of the file.
Obviously, we could pre-pend those 50 bytes to each record in the XML file, but this has the following problems:-
1 The XML file is no longer readable using standard XML s/w (such as XML notepad)
2 The XML file contains x leading records with namespace tags and any split of the file would need to include those first records in each output file.
3 The XML file also contains the closing tag for the "major" tag - this record would need to be appended to each output file (though I imagine any DFSORT solution could include some hard-coded value for this closing tag)
Another option might be to create a second file that only contains the equivalent of the leading 50 bytes for each record and then use DFSORT to merge and split the files based on this second file (somehow or other).
Another option would be to create the XML file as it is at the moment and also create the same file with the 50 bytes pre-pended for each record (the first file is then readable as XML, the second file is used for the sort and split)
Another option would be .......
Any thoughts/suggestions would be gratefully appreciated. _________________ Michael
Is your XML "horizontal" (one long record) or "vertical" (lots of physical records for one logical record)?
How about just adding the 50 bytes as a new XML element? A bit like using the FILLER at the end of a record (except you'd prefer this one to be first, so that it is in a fixed location) it should/may not affect other processing, as everything should be extracted by name, rather than fixed or relative position.
Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Sat Feb 27, 2016 7:04 am Post subject: Can be either
When creating the file, we set a flag to indicate whether the file should be split, one tag per record or created with as many tags as will fit into 256 bytes.
Your idea is interesting. Basically (?) create a dummy tag containing the 50 bye sort information.
DFSORT would then have to a sort the file based on this tag (as well as all tags following it up to the next 50-byte tag). Although writing the sort parms is outside my competence, I would be prepared to Google and experiment on how to do it.
The advantage of your idea is that the XML file is still readable (albeit with an extra "weird" tag).
I'll certainly look into it. Thanx for the idea. _________________ Michael
To create multiple files, you use multiple OUTFIL statements. There you can use INCLUDE= or OMIT=, and also for one, SAVE. The INCLUDE= and OMIT= are similar to the INCLUDE/OMIT COND=, except they work on the final output data.
Yes, if you have multiple physical records, you're going to need WHEN=GROUP to get the records to sort together by PUSHing the key to a temporary extension. After the SORT, in OUTREC or OUTFIL, you can cut the records back down to the original data with BUILD or with IFOUTLEN if you have further handy IFTHENs.
Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Sun Feb 28, 2016 1:53 am Post subject: Thanks again
I'll try looking at it this coming week. For those who "speak" fluent DFSORT, the solution will probably be pretty easy, but if I get it working, I'll post the solution here anyway _________________ Michael
I have deliberately changed the size of the inserted group ID values to 9 rather than 8 so I avoid on the IFTHEN line the string BEGIN(13,13....
(the SORT_ID string is 13 characters and would leave you wondering whether 13 referred to the position or the length of the GROUP delimiter).
Trouble is, when I submit this job, it fails with abend 217 and the nearest I can find to an error message is the following:-
ICE217A 3 170 BYTE VARIABLE RECORD IS LONGER THAN 136 BYTE MAXIMUM FOR SORTOUT _________________ Michael
My trouble now is that I need to sort all the groups based on the record containing the string '<SORT_ID><!--' but not within each group ID - those records should be left as-is
This means, for example, that, if the SORT_ID rows contain
Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Mon Feb 29, 2016 9:43 am Post subject:
After a lot of experimenting, I arrived at the following solution (this doesn't include the BUILD, but that's the last, simple part).
Code:
//STEP040 EXEC PGM=SORT
//* Start by copying the VB file to an FB one. This based on comments I found
//* at http://www.mvsforums.com/helpboards/viewtopic.php?t=2267&sid=7bb7b5ad2739f86fee68c0ec2b0764a7
//* which seemed to indicate that what I REALLY want to do can't be done
//* with a VB file
//SYSOUT DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=MISI01.XML.UNSORTED
//SORTOUT DD DSN=MISI01.XML.SORTED.FIXED,DISP=(,CATLG)
//SYSIN DD *
OPTION COPY
OUTFIL VTOF,BUILD=(5,124)
/*
//*
//S2 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=MISI01.XML.SORTED.FIXED
//SORTOUT DD DSN=MISI01.XML.SORTED,DISP=(,CATLG)
//*SORTOUT DD SYSOUT=*
//SYSIN DD *
* specifies that the original sequence must be preserved.
OPTION EQUALS
* Each bunch of records starting with <SORT_ID><!-- will be grouped
* together, and the whole SORT_ID sequence is placed in column
* 129 of the temp file
INREC IFTHEN=(WHEN=GROUP,BEGIN=(1,13,CH,EQ,C'<SORT_ID><!--'),
PUSH=(129:1,87)),
* .... however, when we reach the final closing tag (</ROOT>), we append a
* "SORT_ID" of <ZZZZZZZZZZZ so it gets sorted last of all records
IFTHEN=(WHEN=(1,7,CH,EQ,C'</ROOT>'),
OVERLAY=(129:C'<ZZZZZZZZZZZZZZZZZZZZZZZZZZ'))
* Okay, sort based on the sort sequence records
SORT FIELDS=(129,87,CH,A)
* ... and create the output file (WITH the extra sort sequence just for clarity)
OUTFIL BUILD=(1,215)
/*
Is it correct, that I have to copy the VB file to an FB one in order to avoid error ICE218A ??? _________________ Michael
Joined: 02 Dec 2002 Posts: 629 Topics: 176 Location: Stockholm, Sweden
Posted: Mon Feb 29, 2016 10:51 am Post subject:
William, Kolusu. I think I've understood what you meant. Here's my code (I've added comments so there should be no doubt as to what I think I'm doing)
Code:
OPTION EQUALS
* Insert 87 blanks immediately after the RDW
INREC IFTHEN=(WHEN=INIT,BUILD=(1,4,87X,5)),
* For each change of SORT_ID, "overlay" the SORT_ID in positions 5 (just after
* the RDW)
IFTHEN=(WHEN=GROUP,BEGIN=(5,13,CH,EQ,C'<SORT_ID><!--'),
PUSH=(5:5,87)),
* Final record should result in <ZZZZZZZ being "overlaid" in pos 5
IFTHEN=(WHEN=(5,7,CH,EQ,C'</ROOT>'),
OVERLAY=(5:C'<ZZZZZZZZZZZZZZZZZZZZZZZZZZ'))
* Sort 87 bytes starting just after the RDW
SORT FIELDS=(5,87,CH,A)
* Build the record, omitting the temp SORT_ID keys
OUTREC BUILD=(1,4,92)
Trouble is, when I run this, I don't see the file being sorted based on the SORT_ID fields.
Kolusu - in your example, did you really mean
Code:
PUSH=(5:34,16))
rather than
Code:
PUSH=(5:18,16)) Your example has the key field in position 18
It didn't occur to me that any references to positions in the IFTHEN had to take into account that 87 bytes had been "inserted" into the file before what I thought were the positions (so pos 5 - with the RDW - is now 5+87 = 92)
As you can see from the example above, I didn't include the OUTREC statement. I like to add that last, simply so I can see whether my first statements do what I expect/want them to.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum