MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Removing records with junk chars

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
cobcurious
Beginner


Joined: 04 Oct 2003
Posts: 68
Topics: 25

PostPosted: Thu Sep 23, 2010 5:22 am    Post subject: Removing records with junk chars Reply with quote

Hi Experts,

I have a fixed length file and I have some free form text between the column numbers: 10 and 246. I have a file with more than a million records. My requirement is to identify those records that have junk characters ( that is, characters other than A-Z and 0-9) preferably through DFSORT.

So far, I have tried using SS option but I could only filter out low-values using it...but not other junk characters.
I had used:
Code:

INCLUDE COND = (10,246,SS,EQ,X'00')


I have searched posts that mention about ALTSEQ but I think this function is for converting from one character to another. I do not want to convert any data.

Please let me know if you need more information.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Thu Sep 23, 2010 12:39 pm    Post subject: Reply with quote

cobcurious,

What is the LRECL and RECFM of the input file?
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
cobcurious
Beginner


Joined: 04 Oct 2003
Posts: 68
Topics: 25

PostPosted: Fri Sep 24, 2010 12:13 am    Post subject: Reply with quote

Hello,

LRECL = 350 and RECFM=FB...but the data that needs to be conditioned is withing column 10 thru 246.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Fri Sep 24, 2010 10:25 am    Post subject: Reply with quote

cobcurious,

Assuming Space x'40' is also a valid character, the following DFSORT JCL will give you the desired results.

Code:

//STEP0100 EXEC PGM=SORT                                     
//SYSOUT   DD SYSOUT=*                                       
//SORTIN   DD DSN=Your input FB 350 byte file,DISP=SHR
//SORTOUT  DD SYSOUT=*                                       
//SYSIN    DD *                                             
  SORT FIELDS=COPY                                           
  INREC IFTHEN=(WHEN=INIT,OVERLAY=(351:10,237)),             
  IFTHEN=(WHEN=INIT,FINDREP=(STARTPOS=351,OUT=C'',           
          IN=(C'A',C'B',C'C',C'D',C'E',C'F',C'G',C'H',C'I', 
              C'J',C'K',C'L',C'M',C'N',C'O',C'P',C'Q',C'R', 
              C'S',C'T',C'U',C'V',C'W',C'X',C'Y',C'Z',C'0', 
              C'1',C'2',C'3',C'4',C'5',C'6',C'7',C'8',C'9')))
                                                             
  OUTFIL BUILD=(1,350),INCLUDE=(351,1,CH,NE,C' ')     
//*

_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
cobcurious
Beginner


Joined: 04 Oct 2003
Posts: 68
Topics: 25

PostPosted: Sat Sep 25, 2010 10:59 am    Post subject: Reply with quote

Hi,

Thanks for the solution. I tried the same to process my junk data. I see a small issue. I have shown the Input data here and output data below it. As we observe that all the records in the input are junk and hence all these should be reported in the output but this is not happening.

Also,
there is a small correction, the record length is 272 and the column range is : 10 thru 254. Accordingly, I have modified my code as

Code:


//STEP0100 EXEC PGM=SORT                                     
//SYSOUT   DD SYSOUT=*                                       
//SORTIN   DD DSN=Your input FB 272 byte file,DISP=SHR
//SORTOUT  DD SYSOUT=*                                       
//SYSIN    DD *                                             
  SORT FIELDS=COPY                                           
  INREC IFTHEN=(WHEN=INIT,OVERLAY=(272:10,244)),             
  IFTHEN=(WHEN=INIT,FINDREP=(STARTPOS=351,OUT=C'',           
          IN=(C'A',C'B',C'C',C'D',C'E',C'F',C'G',C'H',C'I', 
              C'J',C'K',C'L',C'M',C'N',C'O',C'P',C'Q',C'R', 
              C'S',C'T',C'U',C'V',C'W',C'X',C'Y',C'Z',C'0', 
              C'1',C'2',C'3',C'4',C'5',C'6',C'7',C'8',C'9')))
                                                             
  OUTFIL BUILD=(1,272),INCLUDE=(273,1,CH,NE,C' ')     
//*


Note that pipe "|" is being used a column delimiter and its boundaries is fixed..though it may not appear so especially for the boundary between the second and third column

Input
[code:1:69e69efe69]
03XXX1XX| .
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Sat Sep 25, 2010 2:27 pm    Post subject: Reply with quote

Quote:
I see a small issue. I have shown the Input data here and output data below it. As we observe that all the records in the input are junk and hence all these should be reported in the output but this is not happening.


It is NOT my fault that you couldn't modify the job I gave you as per your requirements and then point fingers at me that the proposed solution does NOT work.

problems with your code

1. You copied the junk data to be validated on to position 272 using WHEN=INIT. By doing so, you essentially over laid the last character in your file with your junk data. It should overlaid to position 273

2. You mention that your junk data start from position 10 thru 254 both positions inclusive. So that is a total of 245 bytes and not 244 bytes

3. The next FINDREP is looking for valid characters from position 351. So all the records with junk characters from pos 272 to 350 are ignored. The startpos should be 273.

4. If you want to consider pipe | as a valid character then add that also to the FINDREP list.

Understand the job I gave you and try to make changes accordingly or better yet provide complete details when asked and copy the control cards given as is.

Read this document for a better understanding of FINDREP

http://www.ibm.com/support/docview.wss?rs=114&uid=isg3T7000085
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
cobcurious
Beginner


Joined: 04 Oct 2003
Posts: 68
Topics: 25

PostPosted: Sun Sep 26, 2010 1:43 am    Post subject: Reply with quote

Kolusu wrote:
It is NOT my fault that you couldn't modify the job I gave you as per your requirements and then point fingers at me that the proposed solution does NOT work....


Hi Kolusu,

Please do not feel offended ...the idea was not to offend you but where I was coming from that I saw a small issue with the code that I had provided..and not with the one that you had provided.

I will certainly take your inputs and re-run the modified code. I will share the results with everyone.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group