Posted: Wed Apr 16, 2014 1:59 pm Post subject: Dynamic String Search and Replace using Syncsort
My Input file contains
1234,2345,BLAH-BLAH-BLAH,@@@@,REST OF THE RECORD
3456,4567,BLAH,####,BLAH-BLAH
0000,1111,THIS RECORD DOES NOT CONTAIN @ OR #
I need to find any occurrence of characters @@@@ and replace them with data from position 1 ,4 from the same record. I also want to replace #### with data from 6, 4. If neither search strings are found then output the record as is (and of course I want to strip off the first 10 bytes from all records).
Expected result:
BLAH-BLAH-BLAH,1234,REST OF THE RECORD
BLAH,4567,BLAH-BLAH
THIS RECORD DOES NOT CONTAIN @ OR #
The characters @@@@ and #### can appear anywhere on the record (except in columns 1 thru 10). For simplicity when I build the input, I can restrict that a record contains only one (or zero) occurrence of @@@@ or #### (that is – there will not be @@@@ and #### both present on the same record).
I don’t have DFSORT so I need syntax compatible with SYNCHSORT.
Any ideas?
I can always code COBOL to do this but these things should be doable in SORT.
I have tried various flavors or the PARSE statement but the difficulty with this statement is that you cannot parse and extract a variable length string. I was hoping this would work (but doesn't as STARTBEF or STARTAFT requires FIXLEN parameter)
Joined: 26 Nov 2002 Posts: 12377 Topics: 75 Location: San Jose
Posted: Wed Apr 16, 2014 2:42 pm Post subject:
marshed,
Since you are syncsort user, I am going to give you a hint as to how to do it.
1. Use INREC IFTHEN looking for SS format 'C'@@@@,' and parse for %01 with ABSPOS=11 and ENDBEFR = C'@@@@,' with a fixlen = lrecl-10 and then rest of the record as %02 with fixlen=lrecl-10 and BUILD the record with %01,1,5,%02
2. Use another IFTHEN looking for SS format 'C'####,' and parse for %03 with ABSPOS=11 and ENDBEFR = C'####,' with a fixlen = lrecl-10 and then rest of the record as %04 with fixlen=lrecl-10 and BUILD the record with %03,6,5,%04
3. Use another IFTHEN=(WHEN=NONE and BUILD record from 11 thru end of the record.
4. Use OUTREC BUILD with SQZ to combine the parsed values as a single string with an override of LENGTH=LRECL
Joined: 26 Nov 2002 Posts: 12377 Topics: 75 Location: San Jose
Posted: Thu Apr 17, 2014 1:52 pm Post subject:
marshed,
Your problem is that your second IFTHEN=(WHEN=INIT is working of the output BUILD by the first INIT statement. You need to read my comments once again. I said to use IFTHEN condition for the variables to check.
oh well here are the control cards. These control cards squeeze out the spaces in between. So you may need to work on that and I will leave it for you.
Thanks! It works well - except all the spaces have been squeezed out. I want to retain the basic structure of the input record. Can't shift anything around.
By the way, thanks a lot for educating me on this feature of sort. I certainly learned something new today.
The trouble is FIXLEN. I want a variable length string parsed into %n and output the parsed variables serially. I don't think such a thing exists in sort.
Joined: 26 Nov 2002 Posts: 12377 Topics: 75 Location: San Jose
Posted: Thu Apr 17, 2014 4:08 pm Post subject:
marshed wrote:
The trouble is FIXLEN. I want a variable length string parsed into %n and output the parsed variables serially.I don't think such a thing exists in sort.
Well it exists in DFSORT land and it is quite easy You can generate the strings as is without squeezing the spaces in between
1. When BUILDing the records from each IFTHEN statement, use JFY on the parsed variables shifting left with LEAD=C'"' and TRAIL=C'"' and increase the length to 64, so that each string will be enclosed in quotes.
2. On the OUTREC SQZ with PAIR=QUOTE which will retain the contents in between the quotes AS IS and the rest of the spaces will be Removed.
3. Use FINDREP to remove the additional double quotes we put in. _________________ Kolusu
www.linkedin.com/in/kolusu
marshed, you were supposed to find this yourself, by the way. And an alternative if your data contains quotes. If it contains both, look at what FINDREP can do for you before putting the quotes or the other thing on, and then reversing that afterwards. Thinking in these cases should always be accompanied by reading the manuals.
Bill, Everything can be looked up in a manual so really there is no need for a forum like this. I am a new member to this board but have followed it for a number of years. I have found a lot of useful information from other member's queries. Unfortunately, I have also seen a lot (I mean - A LOT) of "RYFM" (read your #$%^& manual) advice being dished out. If that's all one wishes to contribute to an inquiry or a question is below one's dignigty to be bothered with then one might as well shut up.
Kolusu - thanks again for being so helpful. I do consider myself to be an experienced sort user, but I find sort to be extremely difficult to master. I don't mean to offend you or other IBM fans but IBM manuals are bad! And Sort manual tops it all. At least these days they are not printed with line printer font. But a little bit of color and other fonts and styles (and of course, a lucid language) would certainly help. If you read USGA/R&A Rules of golf from the sixties you will know what I mean.
By the way Kolusu - when is IBM going to introduce SQL like syntax for sort? It will take all the mystery out of sort.
Joined: 26 Nov 2002 Posts: 12377 Topics: 75 Location: San Jose
Posted: Fri Apr 18, 2014 4:27 pm Post subject:
marshed wrote:
Bill, Everything can be looked up in a manual so really there is no need for a forum like this. I am a new member to this board but have followed it for a number of years. I have found a lot of useful information from other member's queries. Unfortunately, I have also seen a lot (I mean - A LOT) of "RYFM" (read your #$%^& manual) advice being dished out. If that's all one wishes to contribute to an inquiry or a question is below one's dignigty to be bothered with then one might as well shut up.
Marshed,
Bill is one of the most valuable contributor on this site as well many other mainframe related helpboards. You need to understand his POV that there are several newbie posters who pretty much rely just using CTRL+C and CTRL+V rather than putting an effort to learn or look up the manuals. So it takes a few posts to understand the full potential of the posters seeking help. Not many people read the manuals or follow directions as they are pretty much relying on someone else doing their work.
So please don't take the suggestions of "RTFM" as a rude advise. People such as BILL love to help in depth for posters who show signs of working upon the clues and arriving at a solution.
marshed wrote:
Kolusu - thanks again for being so helpful.
Thanks. Please post the solution you arrived as it will help the other posters who are looking at a similar problem. Or we may even fine tune it.
marshed wrote:
I do consider myself to be an experienced sort user, but I find sort to be extremely difficult to master. I don't mean to offend you or other IBM fans but IBM manuals are bad! And Sort manual tops it all.
Well I DO consider DFSORT manuals are of top-notch quality and explain every feature in detail. Frank Yaeger over the years have put in a lot of effort into DFSORT manuals. However having said that, I guess we are talking about SYNCSORT manuals in here and I can't say if they are on par with DFSORT.
marshed wrote:
At least these days they are not printed with line printer font. But a little bit of color and other fonts and styles (and of course, a lucid language) would certainly help. If you read USGA/R&A Rules of golf from the sixties you will know what I mean.
I wasn't even born in the sixties so, I wouldn't know.
marshed wrote:
By the way Kolusu - when is IBM going to introduce SQL like syntax for sort? It will take all the mystery out of sort.
SQL Like syntax for Files? or did you mean SQL support in DFSORT? If it is the latter then, there are already IBM utilities (High Performance Unload, DSNTIAUL and DSNTEP2..) which does that. So there is no need for DFSORT to get in to that area. _________________ Kolusu
www.linkedin.com/in/kolusu
I agree with you that people post simple questions/silly questions on the board but this is true everywhere else on the internet. It's the price to be paid on an open to all platform. People want easy answers (including me). I gave up on this after researching the manual for hours and decided to seek help.
And that's where I have problem with IBM manuals. Their guides are written like reference manuals. They are good if you already know that a feature it exists and where to look for it but not that easy to learn something from scratch.
Once I figure out a solution I will share it with you guys. But the real problem has more to it than what I posted originally.
I am talking about SQL like syntax for sort control statements.
for example
Code:
SELECT [1,8],[15,20]...
ORDER BY [1,8,C,A]
Would be same as
SORT FIELDS=(1,8,C,A)
OUTREC FIELDS=(....)
Or
SELECT [FIELD LIST], SUM[] (represents OUTREC and SUMFIELDS)
FROM F1 {inner,left/right/full) JOIN F2 (to indicate a join or join unpaired)
WHERE (represents inrec and JOINKEYS)
ORDER BY (represents sort statement)
If anything you can write in SORT to manipulate files can be done in SQL to manipulate a database then it must be possible to write an SQL to manipulate files (baring minor syntax differences).
It's still work in progress ( am not finding as much time to devote to this) but here's the real problem I am trying to solve.
Dynamically generate SORT cards or SQL predicates for other jobs that process partitioned tables and require limit keys in various flavors (for example: sort statements to split data file into multiple files by limit key ranges).
Code:
//STEP010 EXEC PGM=SYNCSORT,COND=(0,NE)
//SYSOUT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//*THIS FILE CONTAINS LIMIT KEYS FROM SYSIBM.SYSTABLEPART
//SORTJNF1 DD *
001,00000000,48486183
002,48486183,55996407
003,55996407,73117768
004,73117768,73257853
.... all partitions
//SORTJNF2 DD * USER'S INPUT TEMPLATE
01,000, OUTFIL FILES=01,
01,001, INCLUDE=(15,08,CH,GT,C'@@@@@@@@',AND,
01,008, 15,08,CH,LE,C'########')
01,000, OUTFIL FILES=02,
01,009, INCLUDE=(15,08,CH,GT,C'@@@@@@@@',AND,
01,016, 15,08,CH,LE,C'########')
01,000, OUTFIL FILES=03,
01,017, INCLUDE=(15,08,CH,GT,C'@@@@@@@@',AND,
01,024, 15,08,CH,LE,C'########')
01,000, OUTFIL FILES=04,
01,025, INCLUDE=(15,08,CH,GT,C'@@@@@@@@',AND,
01,032, 15,08,CH,LE,C'########')
//SYSIN DD *
* COMMON SORT TO GENERATE SORT CONDITIONS OR SQL PREDICATES BASED ON
* LIMIT KEY OF PARTITIONED TABLES.
* THIS SORT REQUIRES A LIMIT KEY FILE (SORTJNF1) WHICH WILL BE JOINED
* TO A TEMPLATE FILE (WITH A SPECIFIC FORMAT) TO PRODUCE THE DESIRED
* OUTPUT.
* THE LIMIT KEY FILE HAS ALL THE PARTITIONS OF A TABLE IN THE FOLLOWING
* FORFMAT (AND IT IS UNLOADED AT THE BEGINNING OF THE JOB).
* 1,3 = PARTITION NUMBER
* 4,1 = COMMA SEPARATOR
* 5,8 = LOW LIMIT KEY OF THE PARTITION
* 13,1 = COMMA SEPARATOR
* 15,8 = HIGH LIMIT KEY OF THE PARTITION
*
* USER'S TEMPLATE MUST FOLLOW THIS FORMAT:
* OFFSET CONTENTS
* ====== ==========================================================
* 1,1 = COMMENT LINE - IGNORED
* 2,2 = SORTOF FILE NUMBER (CONVERTED OUPUT WRITTEN HERE)
* 3,1 = COMMA SEPARATOR (IGNORED)
* 5,3 = PART NUMBER, IF BLANK THE INPUT IS WRITTEN OUT AS IS
* 8,1 = COMMA SEPARATOR (IGNORED)
* 9,71 = REST OF THE RECORD OPTIONALLY CONTAINING PLACE HOLDER FOR
* LOW OR HIGH LIMIT KEY. THE PLACE HOLDER WILL BE REPLACED WITH
* THE ACCOUNT NUMBER (LIMIT KEY) OF THE PART NUM FROM POS 5,3
* LOW LIMIT KEY IS REPLRESENTED BY @@@@@@@@
* HIGH LIMIT KEY IS REPLRESENTED BY ########
*
* ===> ONLY ONE PLACE HOLDER PER LINE, PLEASE!
*
JOIN UNPAIRED,F2 ALL ROWS FROM TEMPLATE
JOINKEYS FILES=F1,FIELDS=(5,3,A),SORTED PART # FROM SYSPART
JOINKEYS FILES=F2,FIELDS=(4,3,A), PART # FROM TEMPLATE
INCLUDE=(1,1,CH,NE,C'*') REMOVE COMMENT
REFORMAT FIELDS=(F1:10,8, LO-ACCT FROM SYSPART
F1:19,8, HI-ACCT FROM SYSPART
F2:1,72) ENTIRE TEMPLATE RECORD
INREC IFTHEN=(WHEN=(1,88,SS,EQ,C'@@@@@@@@'),
PARSE=(%1=(ENDBEFR=C'@@@@@@@@',FIXLEN=88),
%2=(FIXLEN=88)),
BUILD=(C'"',%1,C'"',1,8,C'"',%2,C'"')),
IFTHEN=(WHEN=(1,88,SS,EQ,C'########'),
PARSE=(%3=(ENDBEFR=C'########',FIXLEN=88),
%4=(FIXLEN=88)),
BUILD=(C'"',%3,C'"',1,8,C'"',%4,C'"')),
IFTHEN=(WHEN=NONE,
BUILD=(C'"',1,88,C'"'))
SORT FIELDS=COPY
OUTFIL FILES=01,
INCLUDE=(18,2,CH,EQ,C'01'),
OUTREC=(25,159,SQZ=(SHIFT=LEFT,LENGTH=80,PAIR=QUOTE))
or I may want to generate where conditions in 4 separate files based on this template:
Code:
* THIS TEMPLATE IS FOR AN SQL PREDICATE SPLIT INTO 4 OUTPUT FILES
01,001, (KEY_COLUMN > '@@@@@@@@' AND
01,008, KEY_COLUMN <= '########')
02,009, (KEY_COLUMN > '@@@@@@@@' AND
02,016, KEY_COLUMN <= '########')
03,017, (KEY_COLUMN > '@@@@@@@@' AND
03,024, KEY_COLUMN <= '########')
04,025, (KEY_COLUMN > '@@@@@@@@' AND
04,032, KEY_COLUMN <= '########')
Here's why I made that comment (my emphasis added, just in case it is required):
Kolusu wrote:
These control cards squeeze out the spaces in between. So you may need to work on that and I will leave it for you.
marshed wrote:
Thanks! It works well - except all the spaces have been squeezed out. I want to retain the basic structure of the input record. Can't shift anything around.
You see how it makes it look like you didn't even read the whole of Kolusu's post, and didn't try any resolution for yourself? It is not difficult to looks at SQZ and read the options which it has.
With no knowledge of the data you are trying to process, I had a concern that you'd come back and say "now my quotes are going missing". With a resolution given to you for that, you'd come back and say "now my apostrophes as missing". Why I was concerned that that would happen, is because it often does happen like that, and because you'd already seemingly started on that route (not reading the whole of Kolusu's post, but just going with what was given).
Your comments were so irrelevant to what I was attempting that they were no more bother to me than a waste of time reading. If it helped your conscience, somehow, to write them, fair enough, but don't expect positive effect beyond that.
Anyway, now we know what you are processing, you should be fine with QUOTE anyway.
We give our time here, so prefer it when 100% of what is said is taken note of. When not, we do hope that other readers still benefit. We don't then feel so much that our time is wasted.
However, another way that time is wasted is if full knowledge of the actual task is not given first. Ironically, it is often people who are trying the "give me some clues and I'll do the rest myself" who lead us into this trap.
Now that we can see what you want to do, there may well be a much simpler way to do it (and two versions possible of that).
Since you use SyncSort, and Kolusu, DFSort Developer, has already given you time on this, I suspect you're now stuck with me and others from the board who may care to contribute.
So, have you ever looked at using symbols in SyncSort? What are symbols? They are in the manual.
What version of SyncSort do you have? Current versions even have support now for JPn symbols, which allow data to be passed from the PARM on the EXEC to the sort control cards.
You would also want to include //SYMNOUT DD SYSOUT=(whatever you use for //SYSOUT).
This is what your OUTFIL statements could loop like, with the values for JP1 through JP4 being supplied on the PARM of the EXEC, where they can also be resolved through JCL symbol substitution.
There are only 10 JPn available, JP0-JP9.
More flexible would be simple symbols, which you define:
OUTFIL FILES=02,
INCLUDE=((KEY-FOR-PARTITION,
GT,
PARTITION1-END),
AND,
(KEY-FOR-PARTITION,
LE,
PARTITION2-END))
OUTFIL FILES=03,
INCLUDE=((KEY-FOR-PARTITION,
GT,
PARTITION2-END),
AND,
(KEY-FOR-PARTITION,
LE,
PARTITION3-END))
OUTFIL FILES=04,SAVE
You'll have to check (you know where) what the SAVE does.
Now you can have the same sort control cards in many steps. The values are resolved at run-time (symbol values on SYMNOUT, resolution of control cards on your normal sysout for the step).
You don't keep the symbols on DD *, of course, but in however many members of a library that you need. You can have different members for key positions (if needed), different members for different ranges of values (if needed).
The initial library members can be generated, or coded by hand. Concatenation on the //SYMNAMES will give you the variations you may need for any given step.
You could use different complete sets of sort control cards, or you can have a library member containing the OUTFILs, and concatenate that to specicy SORT, COPY, MERGE, JOINKEYS, sort control decks.
Symbols are equally at home in SyncTool.
You will notice that the sort deck no longer looks so traditional. It is not "new", I'd guess SyncSort has had symbols for over 10 years. But symbols can give you significant documentary value, easing maintenance, and ease coding, easing maintenance, and reduce replication, thus reducing potential for typos, easing development, testing and maintenance. Makes things easier.
You can of course combine the JPn symbols with user-defined symbols. There are also system symbols (not sure of this for SyncSort, but you could check the manual, or try some, as some things in SyncSort are not documented, which is why I've told you to try it if you can't find it in the manual).
To your comments on manuals in general, if there is something which, after I've read it, I don't understand, I try some experiments. Then I read the manual again. If I still don't understand it, I try some more. I repeat until understood.
The same with any examples I find in manuals, or elsewhere. If I don't understand it, I put in the work to understand it. Looking at examples from Kolusu and Frank Yaeger, and making the effort to understand them where I don't first time out, is how I come to be able to show you the above. And reading the manuals. And making sure I understand what I read.
The tricky one is SPLICE, if you want something to avoid until needed. Otherwise, you should be able to crack it with sufficient effort, which I feel is a real investment. It gets easier each time, and pays you back many times over.
Bill,
Appreciate most of what you have said. You (and Kolusu) have given me a new outlook towards sort. And point noted that today's sort looks quite different from traditional sort. The reason I did not originally post the real problem is that it would simply be a distraction.
I somehow feel that there is an inner teacher in you. And I must say - a teacher who is well versed with the subject matter. But please stay a teacher and don't tread on "the Headmaster" territory. Your students will love you even more. Secondly, when you suspect your time is being wasted, you are the one wasting it.
But I (too) have been judgmental and if it means anything to you, I owe you an apology for being an a$$. If I haven’t already made too many enemies, the apology also goes to your friends and admires who value your contributions to this board.
But I think it is worth mentioning where I come from. In the years gone by, I was a regular visitor at MVSFORUM. At times, I would read just for fun and learn something while killing time. During those years though, some things about this forum always upset me. Many “senior” members were habitually downright rude to other members. Many posts were replied with RTFM replies. At times, I even thought there were racist undertones behind some members’ comments. Perhaps it was during the phase when the western world was about to be taken over by the “outsourcing” storm. Whatever the case may be, as an outsider, I kind of got a feeling this forum was some kind of an old boys club. When I signed up, it took a lot of courage on my part to enter this slaughter house and it almost felt like déjà vu on day 2. But I KNOW that it’s just my perception and at times, perceptions have little to do with reality.
Another point – some of you guys either support IBM products or are at a caliber that you very well could. The average Joe that posts questions is an application programmer. Unless you do a lot of ad-hoc inquiry/reporting based work, you have little opportunity or reason to be a SORT expert. I know you are thinking – “and it shows in your posts".
But your point is well taken, I will RTFM when a hint is provided by a member before asking a follow up question.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum