Posted: Tue May 18, 2004 6:41 pm Post subject: Efficiently Counting Records
I have a job that I currently run in SAS that opens about 300 flat mainframe files (FB, lrecl 700 to 1000), each having up to 500K records. It doesn't actually read any fields, but just loops through the file so that I can get a record count. That's all I need to know...the exact number of records in each file.
This approach does work, but is quite slow. I've got to think there's a quicker way, but after many attempts, I have yet to figure out a more efficient method. The job runs overnight, so memory usage is not really the issue; it's more the time constraints on behalf of the other jobs waiting for this information.
JFCB and other control blocks seem to only tell me space requested, not used, and I need an exact count.
DFSORT does give promising results, spitting out the record count into a file (using OUTFIL with a TRAILER) but trying to run that 300 times and get everything fed into one data set is not working so well.
Any ideas on a way to speed up this process? REXX? ICEMAN?
With the number of files you are reading and the number of records there are in some of the files I was attracted towards developing a solution which involves a measure of parallelism.
I developed a controlling job which :-
1. Reads in a list of files
2. Executes a rexx which FTINCLs a JCL skeleton* for every file in the list
3. Submit the temporary dataset created by the file tailoring process which contains one job for every file in the list, ie in your case there would be 300 jobs in the dataset.
I pass a parameter to the controlling job which generates all the other jobs with a jobname that is controlled by means of a counter. Whatever the value of the counter is, that is the number of different jobnames that will get generated, ie if I pass 6 then the first job generated will be called COUNT1, the next COUNT2 etc...
That way, when they all get submitted you will not flood the system with 300 jobs, rather you will have 300 jobs queued under 6 different jobnames with an even spread across all names.
The job that is generated has two steps.
Step 1 uses ICETOOL to count the records
Step 2 uses SORT to read the ICETOOL messages and reformat the ICE628I
message to write out the name of the dataset (which is passed to the OUTREC statement via the skeleton) and it's record count to a catalogued dataset.
This dataset is allocated MOD so the first job that runs creates it and every other job after that updates it. The contention on this dataset it minimal as it is only held for as long as it takes to write one record to it.
The end result is a file which contains the names of all the files and their record count.
I have tested this and it ran pretty quickly...hint hint. _________________ My opinions are exactly that.
Thanks for the two suggested solutions. I opted for Kolusu's, as my REXX experience is limited at best.
I wrote a program to dynamically create a file with the JCL code as suggested by Kolusu (using the second option, inserting the actual DSN into the Trailer record using SAS Macrovariables), but I have now run out of Task I/O Table Space:
IEF240I KJK4Z STEP00B - TASK I/O TABLE EXCEEDS TIOT LIMIT OF 0032K
IEF272I KJK4Z STEP00B - STEP WAS NOT EXECUTED.
The program was running extremely well under a subset of records, as I was testing, but the system message documentation doesn't offer much hope of clearing this problem up. I ended up with more files than I expected...over 450 now.
I guess I can just split this into two parts, and hope it doesn't grow much more.
Joined: 26 Nov 2002 Posts: 12376 Topics: 75 Location: San Jose
Posted: Wed May 19, 2004 7:48 pm Post subject:
k2,
The erorr you are getting is due to the excess no: of dd statements per job. The maximum number of DD statements per job step is 3273, based on the number of single DD statements allowed for a TIOT (task input output table) control block size of 64K. This limit can be different depending on the installation-defined TIOT size.The IBM-supplied default TIOT size is 32K.
The following table shows the relationship between the size of the TIOT and the maximum number of DDs allowed:
Code:
Maximum number
of DDs allowed
when every DD
Maximum number requests the
SIZE Value of single Unit maximum number
Dec (Hex) Size of TIOT DDs allowed of units (59)
========= ============ ============= ===============
16 10 16384 (16K) 819 64
24 18 24576 (24K) 1226 97
32 20 32768 (32K) 1635 129
40 28 40960 (40K) 2045 162
48 30 49152 (48K) 2454 194
56 38 57344 (56K) 2864 227
64 40 65536 (64K) 3273 259
Your shop is with default TIOT size of 32k which allows only 1635 DD statements per job.
Since you are dynamically generating the JCL to get the counts, why not submit as a seperate job? so the no: of dd statements would be a total of 1352.
1352 = 450(input)+1(output)+1(toolin)+450(control card1)+450(control card2 for trailerto have dsn name)
Thanks, Kolusu, for the additional information about TIOT size. I ended up submitting separate jobs, as I'm not certain how much larger this project may grow. For posterity, here's the SAS Code I used to generate the JCL Code that I then submitted to the Internal Reader; perhaps it will be of use to someone else.
DATA _NULL_;
LENGTH FILE $ 22;
SET ALLFILES END=LAST;
WHERE YEAR = &Y;
FILE FLEXCODE MOD;
FILE = FN;
NUM = PUT(_N_,Z3.);
PUT "//IN" NUM " DD DISP=SHR,DSN=" FILE;
CALL SYMPUT ("FILE"||NUM,TRIM(FILE));
IF LAST THEN DO;
PUT "//TOOLIN DD *";
CALL SYMPUT ("NOBS",_N_);
END;
RUN;
%MACRO COPYLOOP;
%DO I = 1 %TO &NOBS;
%LET Z = %SYSFUNC(PUTN(&I,Z3.));
PUT " COPY FROM(IN&Z) USING(C&Z)";
%END;
%MEND COPYLOOP;
%MACRO CNTLLOOP;
%DO I = 1 %TO &NOBS;
%LET Z = %SYSFUNC(PUTN(&I,Z3.));
PUT "//C&Z.CNTL DD *";
PUT " OUTFIL FNAMES=OUT,NODETAIL,REMOVECC,";
PUT " TRAILER1=(2:'%STR(&&FILE&Z)',26:COUNT)";
%END;
%MEND CNTLLOOP;
DATA _NULL_;
FILE FLEXCODE MOD;
%COPYLOOP
%CNTLLOOP
RUN;
ICE000I 0 - CONTROL STATEMENTS FOR 5694-A01, Z/OS DFSORT V1R5 - 09:12 ON TUE AP
OUTREC FIELDS=(1,80)
OUTFIL FNAMES=OUT,NODETAIL,REMOVECC,
TRAILER1=('TOTAL NO: OF RECORDS IN THE FILE 002 :',COUNT,80:X)
ICE146I 0 END OF STATEMENTS FROM C002CNTL - PARAMETER LIST STATEMENTS FOLLOW
DEBUG NOABEND,ESTAE
OPTION MSGDDN=DFSMSG,LIST,MSGPRT=ALL,RESINV=0,SORTDD=C002,SORTIN=IN00
DYNALLOC
SORT FIELDS=COPY
ICE201I 0 RECORD TYPE IS F - DATA STARTS IN POSITION 1
ICE027A 3 END OF FIELD BEYOND MAXIMUM RECORD LENGTH
ICE751I 0 C5-K05352 C6-Q95214 C7-K90000 C8-K05352 E9-K06751 E7-K90000
ICE052I 3 END OF DFSORT
Please help me getting the correct output file. _________________ Thanks
Madhu Sudhan
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
Posted: Tue Apr 01, 2008 8:47 am Post subject:
there are always solutions to problems created by bad design.
If the record count is so important, why waste cycles counting the records after the fact. proper design/redesing/re-engineering would lead one to modify the jobs that create the files to output a record count.
If these flat files are comming from remote sources, you need a high-speed record counter.
why are these record counts so important? _________________ Dick Brenholtz
American living in Varel, Germany
Per Kolusu's example, you must use a MOD data set for OUT so all of the counts will be appended to it. Without MOD, the last count will be the only one you see. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Thanks Frank. The solution given by you is working.
Kolusu,
Quote:
OUTREC OVERLAY=(80:X)
I had tried above code but it was again giving same problem. I will go ahead with Franks code. Thanks for your fast responses. _________________ Thanks
Madhu Sudhan
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum