MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Performance improvements to a batch COBOL/DB2 insert pgm
Goto page 1, 2  Next
 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Database
View previous topic :: View next topic  
Author Message
geetha001
Beginner


Joined: 22 Jun 2005
Posts: 41
Topics: 14

PostPosted: Sun May 23, 2010 6:15 pm    Post subject: Performance improvements to a batch COBOL/DB2 insert pgm Reply with quote

I have a batch cobol program which performs slowly, current information about the program is as follows:
    Performs Multi row insert (100 rows at a time)
    inserts into two tables a parent and child table
    The primary key of the table is a number (a random number which is generated in the cobol program)
    Average number of inserts into both the parent and child table are 100000 records per day
    The parent and child have one to one relationship and connected through the ID number
    Each of these tables have millions of records.
    The parent table is not partitioned, but the child table is partitioned on the id number by range.

Load would be a good option, but since the random number has to be generated in the cobol program, load option is difficult. Also get diagnostics is performed if the row insert fails due to duplicate key, random number is regenerated a limited number of times and the insert is retried.

Can anybody help me with any suggestions on how to improve the performance of this insert job.
Back to top
View user's profile Send private message
papadi
Supermod


Joined: 20 Oct 2009
Posts: 594
Topics: 1

PostPosted: Sun May 23, 2010 6:39 pm    Post subject: Reply with quote

Quote:
have a batch cobol program which performs slowly
Compared to what? Another task that performs the same work but using less resources? Someone's opinion?

100k is not a huge volume to insert. . .

How much data is read from these and other tables in an execution?

Until more is posted about the "insert" job it will be quite difficult to suggest things that will improve it.

Have you identified where the lost time is going?

Have you talked with the dba to see if this has come to their attention?
_________________
All the best,

di
Back to top
View user's profile Send private message
geetha001
Beginner


Joined: 22 Jun 2005
Posts: 41
Topics: 14

PostPosted: Sun May 23, 2010 8:42 pm    Post subject: Reply with quote

In general the batch program takes a long time to run. Average run time is over an 1 hr. I am looking to optimize this run time as we may have a much higher volume going forward.

Quote:
How much data is read from these and other tables in an execution?


This program does not read any data from these tables, but only does an insert into both the parent and child tables.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12372
Topics: 75
Location: San Jose

PostPosted: Sun May 23, 2010 8:50 pm    Post subject: Reply with quote

geetha001,

Did you run any kind of analysis as to where the program is spending time? Is it in the insert phase or data preparation phase? Did you run an explain on the SQL queries? How is the random number generated ? Do you have the option of changing the DB2 table to have an identity column for the random numbered key? Or How about adding a timestamp field to the DB2 which would contain the time the key is inserted.

check this link for running explain

http://www.mvsforums.com/helpboards/viewtopic.php?t=215&highlight=explain

Kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
papadi
Supermod


Joined: 20 Oct 2009
Posts: 594
Topics: 1

PostPosted: Sun May 23, 2010 9:01 pm    Post subject: Reply with quote

If you run the code as-is, with the INSERTs commented (so none are actually done) how long does the program run?
_________________
All the best,

di
Back to top
View user's profile Send private message
geetha001
Beginner


Joined: 22 Jun 2005
Posts: 41
Topics: 14

PostPosted: Mon May 24, 2010 7:20 pm    Post subject: Reply with quote

Kolusu,

I have the results of the explain plan from production. I personally did not run any explains in test region yet. From my understanding about the explain plan table, I could not see anything alarming. Most of the plan table columns do not give much information for the insert. I do see that the lock mode used is IX though during the insert.

However I would like to know if there is anything specific that I might have to check in the plan table for the inserts.

The 10 digit random number is generated from the current timestamp value.

Can you please clarify/expand the below
Quote:

How about adding a timestamp field to the DB2 which would contain the time the key is inserted.


Thanks
Back to top
View user's profile Send private message
papadi
Supermod


Joined: 20 Oct 2009
Posts: 594
Topics: 1

PostPosted: Mon May 24, 2010 9:03 pm    Post subject: Reply with quote

Quote:
If you run the code as-is, with the INSERTs commented (so none are actually done) how long does the program run?
What is the result of running this. . .?

Quote:
Can you please clarify/expand the below
The suggestion is to add a timestamp so you can see the time it takes between inserts.

Quote:
However I would like to know if there is anything specific that I might have to check in the plan table for the inserts.
Suggest you first verify that the "lost time" is due to the inserts. I suspect it may not. . .
_________________
All the best,

di
Back to top
View user's profile Send private message
jsharon1248
Intermediate


Joined: 08 Aug 2007
Posts: 291
Topics: 2
Location: Chicago

PostPosted: Tue May 25, 2010 7:58 am    Post subject: Reply with quote

Quote:
Also get diagnostics is performed if the row insert fails due to duplicate key, random number is regenerated a limited number of times and the insert is retried.


Huh? You're generating what's supposed to be a unique key, issuing the insert, and then if the key is being used, you start all over, and you're wondering why you're only getting 55 inserts a second? Come on. Have you bothered to count the failures? I'll bet you're really issuing 100+ inserts/second but you're only successfull on half of them. Also, how often do you commit? Just curious.
Back to top
View user's profile Send private message
geetha001
Beginner


Joined: 22 Jun 2005
Posts: 41
Topics: 14

PostPosted: Mon Jun 14, 2010 5:52 pm    Post subject: Reply with quote

After a lot of research, found that there is a issue with the way the random number is generated. The current module that generates the random number is based on current timestamp and since the ID number is defined as an integer (leading zero suppression), the random number can range anywhere between 1 to 10 (depending on the number of zeroes in the random number).

As mentioned by jsharon1248, there are lot of duplicates generated and retried (I guess that is obvious by now from my above para). Usually it is 10%, but some days it is 200% duplicate retry.

However, now that the issue has been focussed on the random number generator, I am looking for help on generating a unique atleast a 15 digit random number for every row that is inserted. Although there are some common modules already available to do it, I was thinking if any one of you had a good idea of generating such a number quickly (in a unique way) keeping in mind that there will be a huge number of records to be inserted.

I will continue my research... in the meantime if any of you have tips let me know.

Thanks
Back to top
View user's profile Send private message
Dibakar
Advanced


Joined: 02 Dec 2002
Posts: 700
Topics: 63
Location: USA

PostPosted: Mon Jun 14, 2010 6:50 pm    Post subject: Reply with quote

Random numbers are generated for some specific purpose, why do you need one.

All you needd is a unique key, that can be achived by adding 1 to the max of existing keys.
_________________
Regards,
Diba
Back to top
View user's profile Send private message Send e-mail
dbzTHEdinosauer
Supermod


Joined: 20 Oct 2006
Posts: 1411
Topics: 26
Location: germany

PostPosted: Mon Jun 14, 2010 6:54 pm    Post subject: Reply with quote

the suggestion for adding a timestamp
in an attempt to measure INSERT speed,
will not work
because the timestamp generated for 100 rows inserted during a mass insert will be the same.

as far as speeding up the process:

why the random number?
you are forcing your INSERTs all over the place,
the effect on index build is what is taking a lot of time,
and unless you are on an old vsn of db2 (<6)
(or old DASD - can't remember the reason)
you don't need to space out your INSERTs:
contiguous inserts are considered more efficient.

also, requires continuous reorg

question:

within a 100-row insert, are there a combination of parent and related child
or do you insert all 100 first and then the corresponding 100 to the related table?

as an experiment,
before you change any code,
run a test - create a benchmark
turn off the RI on the tables and run a test
that will tell you what the RI is costing you.

Here is a link to a db2 guru, that explains more about analyzing RI, especially supporting indexes
_________________
Dick Brenholtz
American living in Varel, Germany
Back to top
View user's profile Send private message
geetha001
Beginner


Joined: 22 Jun 2005
Posts: 41
Topics: 14

PostPosted: Mon Jun 14, 2010 7:39 pm    Post subject: Reply with quote

Quote:

why the random number?


For whatever reason this table was designed to have the random number. Only now seeing the issue with this and trying to find easy and simple ways to fix this issue. (Generate a random number in a better way).

Quote:

contiguous inserts are considered more efficient.


I guess you are talking about the Append 'YES' option here with the newer DB2 version. I have one question here, will there be an overhead of Reorg every day if Append 'YES' is used? (because daily job)

Quote:

within a 100-row insert, are there a combination of parent and related child
or do you insert all 100 first and then the corresponding 100 to the related table?


The parent and child have one to one relationship.
Yes the first 100 are inserted first and then the corresponding 100 next.

For a simpler change one of the options was to look and see how the Random number can be generated uniquely, because using identity key would involve a lot more effort of converting the existing data etc.,

I appreciate all your inputs and thanks to all of you.

I would appreciate if I can get more inputs on the random number generator.

Thanks
Back to top
View user's profile Send private message
papadi
Supermod


Joined: 20 Oct 2009
Posts: 594
Topics: 1

PostPosted: Mon Jun 14, 2010 10:23 pm    Post subject: Reply with quote

There is a difference between a unique number and a random number. . .

Might you consider a unique number?
_________________
All the best,

di
Back to top
View user's profile Send private message
dbzTHEdinosauer
Supermod


Joined: 20 Oct 2006
Posts: 1411
Topics: 26
Location: germany

PostPosted: Tue Jun 15, 2010 8:41 am    Post subject: Reply with quote

the basic problem of a random number generator that must also generate a unique number
is that the two processes
(generating a random number & generating a unique number)
are not in sync or together.

the random number is created by your little routine and the unique attribute is proved or disproved by the db2 INSERT.

unfortunately, there is no easy min(unused number) function,
so I suggest you determine which unique numbers are not already in use,
(dump all the key numbers, pass thru and
create another table/file of these numbers not in use.

Then use the unused numbers for your inserts.

Until you give up this old/out-dated methodology of skipping around to do inserts
(this philosophy was dreamed up due to many tasks making inserts near simulataneously -
you only have one task making inserts - there is no need to skip around)

you will continually have this problem.

use the file/table containing unused numbers,
build a load file and use db2 load to INSERT the rows.

dbz wrote:

Quote:

contiguous inserts are considered more efficient.


geetha001 wrote:

I guess you are talking about the Append 'YES' option here with the newer DB2 version. I have one question here, will there be an overhead of Reorg every day if Append 'YES' is used? (because daily job)



your lack of understanding, obviously makes you worry about changing this process.

at present, you have an unacceptable methodology - 200% - even 20% duplicate inserts is unacceptable.

your goal should be to insure that your inserts are 100% - 0% duplicates.

logic would conclude that if you only have to read contiguous data the reorg process would
be faster that having to jump around or sort everything.
_________________
Dick Brenholtz
American living in Varel, Germany
Back to top
View user's profile Send private message
NASCAR9
Intermediate


Joined: 08 Oct 2004
Posts: 274
Topics: 52
Location: California

PostPosted: Tue Jun 15, 2010 6:31 pm    Post subject: Reply with quote

Could this be incorporated in the query? Question

SELECT RAND()
FROM SYSIBM.SYSDUMMY1;
_________________
Thanks,
NASCAR9
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Database All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group