View previous topic :: View next topic |
Author |
Message |
rsantosh Beginner
Joined: 20 Dec 2014 Posts: 38 Topics: 9
|
Posted: Sat Jun 20, 2020 12:35 pm Post subject: Same query different CPU time with different host value |
|
|
Hello,
I have a SP which performance differently depending on host variable value.
e.g
Code: |
SELECT BRANCH,ACCOUNT,SECURITY
FROM TABLE1
WHERE SECURITY LIKE :HV1
AND BRANCH BETWEEN '300' AND '600'
|
Index IX1 has two columns SECURITY & BRANCH(duplicate Index). SECURITY is CHAR(4)
When HV1 is say "ABCD" query is very fast but when HV1 is say "ADEF" query is taking considerably high time.
Why is there such a big difference? What can be done to overcome this issue?
Actual query is little complex but EXPLAIN does show it is using the Index scan.
Regards,
Santosh |
|
Back to top |
|
|
haatvedt Beginner
Joined: 14 Nov 2003 Posts: 66 Topics: 0 Location: St Cloud, Minnesota USA
|
Posted: Sat Jun 20, 2020 1:08 pm Post subject: |
|
|
Santosh,
I suspect that the data is skewed and does not have a uniform distribution. I have a few questions.
How many rows exist on the table with SECURITY = "ABCD" ?
How many rows exist on the table with SECURITY = "ADEF" ?
Does the program ever use a "%" in the host variable HV1 ? if not then change the predicate to use an equal condition as follows
WHERE SECURITY = :HV1
AND BRANCH BETWEEN '300' AND '600' _________________ Chuck Haatvedt
email --> clastnameatcharterdotnet
(replace lastname, at, dot with appropriate
characters) |
|
Back to top |
|
|
rsantosh Beginner
Joined: 20 Dec 2014 Posts: 38 Topics: 9
|
Posted: Sun Jun 21, 2020 6:55 am Post subject: |
|
|
Hi Chuck,
Program doesn't use %. Table has two index,Unique IX1 on SECURITY,BRANCH,ACCOUNT+some more columns and Unique IX2 on BRANCH,ACCOUNT,SECURITY+some more columns.
When we used SECURITY = :HV1, explain showed very high cost(500+) and use of Index IX1 so we replaced "="(equal) with "LIKE" to force db2 to go for INDEX IX2(explain showed very low cost 0.8).
But with SPUFI(same query) what we have observed is sometimes "LIKE" works better and sometimes "=".
Table has 80M rows.
Regards,
Santosh |
|
Back to top |
|
|
haatvedt Beginner
Joined: 14 Nov 2003 Posts: 66 Topics: 0 Location: St Cloud, Minnesota USA
|
Posted: Sun Jun 21, 2020 5:54 pm Post subject: |
|
|
Santosh, optimizing SQL queries is not trivial, if you want to get more help on this perhaps you could send me an email (see my signature).
remember that the explain cost is an ESTIMATE based on the statistics that DB2 has available to it. it may be worthwhile to check and see how current the stats are on the table and indexes. the DBA should be able to help with this if you don't have access to that information.
it's difficult to determine the optimal access path without seeing the DB2 stats and also doing some queries to determine the distribution of the key values. also it would be helpful to know the cluster ratio of the 2 indexes you referred to.
it would appear that using SECURITY = :HV1 should be most efficient as it is matching on 2 columns on the X1 index (note it has an equal predicate on the first column and would be scanning on the second column). for X2 index it would only have a 1 column match and that would be scanning on BRANCH.
using SPUFI to do any type of performance testing is not reliable.
the best way to test the performance is to generate a file containing the values to be used in your SQL query. the file should contain at least a few thousand records with randomly generated values.
what i usually do is to create a small program which contain both version of the SQL statement that I am testing with a runtime switch which will determine which version of the SQL statement to execute.
then create a batch job with 2 steps both executing the test program with the same run time switch and input file. the create a second job with the same conditions other than the run time switch which should execute the other version of the SQL statement.
then run both jobs on the same LPAR and compare the cpu time of the second steps of the jobs.
the reason to use two steps is that the cpu time is highly dependent on whether the DB2 data is in memory or not. By running the steps twice in the same job, the second step of both jobs should have nearly the same benefit of having the data in memory as the first step would access the same data.
did you run a SELECT COUNT(*) on the two values of the SECURITY to see if the data is skewed ? _________________ Chuck Haatvedt
email --> clastnameatcharterdotnet
(replace lastname, at, dot with appropriate
characters) |
|
Back to top |
|
|
rsantosh Beginner
Joined: 20 Dec 2014 Posts: 38 Topics: 9
|
Posted: Tue Jun 23, 2020 11:34 am Post subject: |
|
|
Hi Chuck,
Thank you for the detailed info.
Today when I ran EXPLAIN on the query I saw it using same INDEX with both "LIKE" and "=(equal)" unlike the other day.
LIKE :HV1 vs = :HV1
I am puzzled now.
I am trying to see if I can get the EXPLAIN result after masking the data. I understand without the full result it would be difficult for anyone to provide solution.
Regards,
Santosh |
|
Back to top |
|
|
rsantosh Beginner
Joined: 20 Dec 2014 Posts: 38 Topics: 9
|
Posted: Tue Jun 23, 2020 11:50 am Post subject: |
|
|
Just to add, the table has 80M rows as of today and everyday 20K+ rows get added to it. Would it be better to use REOPT option to get the access determined during run time?
Regards,Santosh |
|
Back to top |
|
|
rsantosh Beginner
Joined: 20 Dec 2014 Posts: 38 Topics: 9
|
Posted: Wed Jun 24, 2020 12:02 pm Post subject: |
|
|
What are the cons of using bind option REOPT(ALWAYS)? Is it advisable to use this option?
Regards,
Santosh |
|
Back to top |
|
|
haatvedt Beginner
Joined: 14 Nov 2003 Posts: 66 Topics: 0 Location: St Cloud, Minnesota USA
|
Posted: Wed Jun 24, 2020 3:58 pm Post subject: |
|
|
Santosh,
REOPT(ALWAYS) is something that I've seen used when the data is skewed. So far I don't know if you have shown that the data distribution is skewed. If the data is not skewed or you are access the more frequently occurring SECURITY & BRANCH ranges, then the extra overhead used to reoptimize the SQL statement would not result in any saving and would actually increase the cost. Again the way to determine this is to execute the SQL statement many times....
when looking at the explain data, did you make note of the number of matching columns ?
it would be helpful to know the cardinality of the SECURITY and BRANCH columns.
have you written a small batch program to test the performance of different versions of the query ?
remember that EXPLAIN only provides an ESTIMATE of the cost, the best way to test the performance of an SQL statement is to actually EXECUTE it many times to get an average cost. The query below will give you a list of the worst performing combinations of SECURITY / BRANCH combinations. This will highlight issues with skewed data which I suspect is contributing to the performance issues.
execute the following query to generate the test data
SELECT SECURITY, BRANCH, COUNT(*)
FROM TABLE1
GROUP BY SECURITY, BRANCH
ORDER BY 3 DESC
FETCH FIRST 500 ROWS ONLY
WITH UR;
then execute your test program in batch with the output of the query above concatenating the file from above 4 times so it will execute your SQL statement 2,000 times. _________________ Chuck Haatvedt
email --> clastnameatcharterdotnet
(replace lastname, at, dot with appropriate
characters) |
|
Back to top |
|
|
|
|