View previous topic :: View next topic |
Author |
Message |
geetha001 Beginner
Joined: 22 Jun 2005 Posts: 41 Topics: 14
|
Posted: Sun May 23, 2010 6:15 pm Post subject: Performance improvements to a batch COBOL/DB2 insert pgm |
|
|
I have a batch cobol program which performs slowly, current information about the program is as follows:
Performs Multi row insert (100 rows at a time)
inserts into two tables a parent and child table
The primary key of the table is a number (a random number which is generated in the cobol program)
Average number of inserts into both the parent and child table are 100000 records per day
The parent and child have one to one relationship and connected through the ID number
Each of these tables have millions of records.
The parent table is not partitioned, but the child table is partitioned on the id number by range.
Load would be a good option, but since the random number has to be generated in the cobol program, load option is difficult. Also get diagnostics is performed if the row insert fails due to duplicate key, random number is regenerated a limited number of times and the insert is retried.
Can anybody help me with any suggestions on how to improve the performance of this insert job. |
|
Back to top |
|
|
papadi Supermod
Joined: 20 Oct 2009 Posts: 594 Topics: 1
|
Posted: Sun May 23, 2010 6:39 pm Post subject: |
|
|
Quote: | have a batch cobol program which performs slowly | Compared to what? Another task that performs the same work but using less resources? Someone's opinion?
100k is not a huge volume to insert. . .
How much data is read from these and other tables in an execution?
Until more is posted about the "insert" job it will be quite difficult to suggest things that will improve it.
Have you identified where the lost time is going?
Have you talked with the dba to see if this has come to their attention? _________________ All the best,
di |
|
Back to top |
|
|
geetha001 Beginner
Joined: 22 Jun 2005 Posts: 41 Topics: 14
|
Posted: Sun May 23, 2010 8:42 pm Post subject: |
|
|
In general the batch program takes a long time to run. Average run time is over an 1 hr. I am looking to optimize this run time as we may have a much higher volume going forward.
Quote: | How much data is read from these and other tables in an execution?
|
This program does not read any data from these tables, but only does an insert into both the parent and child tables. |
|
Back to top |
|
|
kolusu Site Admin
Joined: 26 Nov 2002 Posts: 12375 Topics: 75 Location: San Jose
|
Posted: Sun May 23, 2010 8:50 pm Post subject: |
|
|
geetha001,
Did you run any kind of analysis as to where the program is spending time? Is it in the insert phase or data preparation phase? Did you run an explain on the SQL queries? How is the random number generated ? Do you have the option of changing the DB2 table to have an identity column for the random numbered key? Or How about adding a timestamp field to the DB2 which would contain the time the key is inserted.
check this link for running explain
http://www.mvsforums.com/helpboards/viewtopic.php?t=215&highlight=explain
Kolusu |
|
Back to top |
|
|
papadi Supermod
Joined: 20 Oct 2009 Posts: 594 Topics: 1
|
Posted: Sun May 23, 2010 9:01 pm Post subject: |
|
|
If you run the code as-is, with the INSERTs commented (so none are actually done) how long does the program run? _________________ All the best,
di |
|
Back to top |
|
|
geetha001 Beginner
Joined: 22 Jun 2005 Posts: 41 Topics: 14
|
Posted: Mon May 24, 2010 7:20 pm Post subject: |
|
|
Kolusu,
I have the results of the explain plan from production. I personally did not run any explains in test region yet. From my understanding about the explain plan table, I could not see anything alarming. Most of the plan table columns do not give much information for the insert. I do see that the lock mode used is IX though during the insert.
However I would like to know if there is anything specific that I might have to check in the plan table for the inserts.
The 10 digit random number is generated from the current timestamp value.
Can you please clarify/expand the below
Quote: |
How about adding a timestamp field to the DB2 which would contain the time the key is inserted.
|
Thanks |
|
Back to top |
|
|
papadi Supermod
Joined: 20 Oct 2009 Posts: 594 Topics: 1
|
Posted: Mon May 24, 2010 9:03 pm Post subject: |
|
|
Quote: | If you run the code as-is, with the INSERTs commented (so none are actually done) how long does the program run? | What is the result of running this. . .?
Quote: | Can you please clarify/expand the below | The suggestion is to add a timestamp so you can see the time it takes between inserts.
Quote: | However I would like to know if there is anything specific that I might have to check in the plan table for the inserts. | Suggest you first verify that the "lost time" is due to the inserts. I suspect it may not. . . _________________ All the best,
di |
|
Back to top |
|
|
jsharon1248 Intermediate
Joined: 08 Aug 2007 Posts: 291 Topics: 2 Location: Chicago
|
Posted: Tue May 25, 2010 7:58 am Post subject: |
|
|
Quote: | Also get diagnostics is performed if the row insert fails due to duplicate key, random number is regenerated a limited number of times and the insert is retried. |
Huh? You're generating what's supposed to be a unique key, issuing the insert, and then if the key is being used, you start all over, and you're wondering why you're only getting 55 inserts a second? Come on. Have you bothered to count the failures? I'll bet you're really issuing 100+ inserts/second but you're only successfull on half of them. Also, how often do you commit? Just curious. |
|
Back to top |
|
|
geetha001 Beginner
Joined: 22 Jun 2005 Posts: 41 Topics: 14
|
Posted: Mon Jun 14, 2010 5:52 pm Post subject: |
|
|
After a lot of research, found that there is a issue with the way the random number is generated. The current module that generates the random number is based on current timestamp and since the ID number is defined as an integer (leading zero suppression), the random number can range anywhere between 1 to 10 (depending on the number of zeroes in the random number).
As mentioned by jsharon1248, there are lot of duplicates generated and retried (I guess that is obvious by now from my above para). Usually it is 10%, but some days it is 200% duplicate retry.
However, now that the issue has been focussed on the random number generator, I am looking for help on generating a unique atleast a 15 digit random number for every row that is inserted. Although there are some common modules already available to do it, I was thinking if any one of you had a good idea of generating such a number quickly (in a unique way) keeping in mind that there will be a huge number of records to be inserted.
I will continue my research... in the meantime if any of you have tips let me know.
Thanks |
|
Back to top |
|
|
Dibakar Advanced
Joined: 02 Dec 2002 Posts: 700 Topics: 63 Location: USA
|
Posted: Mon Jun 14, 2010 6:50 pm Post subject: |
|
|
Random numbers are generated for some specific purpose, why do you need one.
All you needd is a unique key, that can be achived by adding 1 to the max of existing keys. _________________ Regards,
Diba |
|
Back to top |
|
|
dbzTHEdinosauer Supermod
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
|
Posted: Mon Jun 14, 2010 6:54 pm Post subject: |
|
|
the suggestion for adding a timestamp
in an attempt to measure INSERT speed,
will not work
because the timestamp generated for 100 rows inserted during a mass insert will be the same.
as far as speeding up the process:
why the random number?
you are forcing your INSERTs all over the place,
the effect on index build is what is taking a lot of time,
and unless you are on an old vsn of db2 (<6)
(or old DASD - can't remember the reason)
you don't need to space out your INSERTs:
contiguous inserts are considered more efficient.
also, requires continuous reorg
question:
within a 100-row insert, are there a combination of parent and related child
or do you insert all 100 first and then the corresponding 100 to the related table?
as an experiment,
before you change any code,
run a test - create a benchmark
turn off the RI on the tables and run a test
that will tell you what the RI is costing you.
Here is a link to a db2 guru, that explains more about analyzing RI, especially supporting indexes _________________ Dick Brenholtz
American living in Varel, Germany |
|
Back to top |
|
|
geetha001 Beginner
Joined: 22 Jun 2005 Posts: 41 Topics: 14
|
Posted: Mon Jun 14, 2010 7:39 pm Post subject: |
|
|
Quote: |
why the random number?
|
For whatever reason this table was designed to have the random number. Only now seeing the issue with this and trying to find easy and simple ways to fix this issue. (Generate a random number in a better way).
Quote: |
contiguous inserts are considered more efficient.
|
I guess you are talking about the Append 'YES' option here with the newer DB2 version. I have one question here, will there be an overhead of Reorg every day if Append 'YES' is used? (because daily job)
Quote: |
within a 100-row insert, are there a combination of parent and related child
or do you insert all 100 first and then the corresponding 100 to the related table?
|
The parent and child have one to one relationship.
Yes the first 100 are inserted first and then the corresponding 100 next.
For a simpler change one of the options was to look and see how the Random number can be generated uniquely, because using identity key would involve a lot more effort of converting the existing data etc.,
I appreciate all your inputs and thanks to all of you.
I would appreciate if I can get more inputs on the random number generator.
Thanks |
|
Back to top |
|
|
papadi Supermod
Joined: 20 Oct 2009 Posts: 594 Topics: 1
|
Posted: Mon Jun 14, 2010 10:23 pm Post subject: |
|
|
There is a difference between a unique number and a random number. . .
Might you consider a unique number? _________________ All the best,
di |
|
Back to top |
|
|
dbzTHEdinosauer Supermod
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
|
Posted: Tue Jun 15, 2010 8:41 am Post subject: |
|
|
the basic problem of a random number generator that must also generate a unique number
is that the two processes
(generating a random number & generating a unique number)
are not in sync or together.
the random number is created by your little routine and the unique attribute is proved or disproved by the db2 INSERT.
unfortunately, there is no easy min(unused number) function,
so I suggest you determine which unique numbers are not already in use,
(dump all the key numbers, pass thru and
create another table/file of these numbers not in use.
Then use the unused numbers for your inserts.
Until you give up this old/out-dated methodology of skipping around to do inserts
(this philosophy was dreamed up due to many tasks making inserts near simulataneously -
you only have one task making inserts - there is no need to skip around)
you will continually have this problem.
use the file/table containing unused numbers,
build a load file and use db2 load to INSERT the rows.
dbz wrote: |
Quote:
contiguous inserts are considered more efficient. |
geetha001 wrote: |
I guess you are talking about the Append 'YES' option here with the newer DB2 version. I have one question here, will there be an overhead of Reorg every day if Append 'YES' is used? (because daily job) |
your lack of understanding, obviously makes you worry about changing this process.
at present, you have an unacceptable methodology - 200% - even 20% duplicate inserts is unacceptable.
your goal should be to insure that your inserts are 100% - 0% duplicates.
logic would conclude that if you only have to read contiguous data the reorg process would
be faster that having to jump around or sort everything. _________________ Dick Brenholtz
American living in Varel, Germany |
|
Back to top |
|
|
NASCAR9 Intermediate
Joined: 08 Oct 2004 Posts: 274 Topics: 52 Location: California
|
Posted: Tue Jun 15, 2010 6:31 pm Post subject: |
|
|
Could this be incorporated in the query?
SELECT RAND()
FROM SYSIBM.SYSDUMMY1; _________________ Thanks,
NASCAR9 |
|
Back to top |
|
|
|
|