MVSFORUMS.com

rsantosh · Beginner Joined: 20 Dec 2014 Posts: 47 Topics: 11

Hello,

I have a SP which performance differently depending on host variable value.

e.g

haatvedt · Posted: Sat Jun 20, 2020 1:08 pm Post subject:

Santosh,

I suspect that the data is skewed and does not have a uniform distribution. I have a few questions.

How many rows exist on the table with SECURITY = "ABCD" ?

How many rows exist on the table with SECURITY = "ADEF" ?

Does the program ever use a "%" in the host variable HV1 ? if not then change the predicate to use an equal condition as follows

WHERE SECURITY = :HV1
AND BRANCH BETWEEN '300' AND '600'
_________________
Chuck Haatvedt

email --> clastnameatcharterdotnet

(replace lastname, at, dot with appropriate
characters)

rsantosh · Beginner Joined: 20 Dec 2014 Posts: 47 Topics: 11

Hi Chuck,

Program doesn't use %. Table has two index,Unique IX1 on SECURITY,BRANCH,ACCOUNT+some more columns and Unique IX2 on BRANCH,ACCOUNT,SECURITY+some more columns.

When we used SECURITY = :HV1, explain showed very high cost(500+) and use of Index IX1 so we replaced "="(equal) with "LIKE" to force db2 to go for INDEX IX2(explain showed very low cost 0.8).

But with SPUFI(same query) what we have observed is sometimes "LIKE" works better and sometimes "=".

Table has 80M rows.

Regards,
Santosh

haatvedt · Posted: Sun Jun 21, 2020 5:54 pm Post subject:

Santosh, optimizing SQL queries is not trivial, if you want to get more help on this perhaps you could send me an email (see my signature).

remember that the explain cost is an ESTIMATE based on the statistics that DB2 has available to it. it may be worthwhile to check and see how current the stats are on the table and indexes. the DBA should be able to help with this if you don't have access to that information.

it's difficult to determine the optimal access path without seeing the DB2 stats and also doing some queries to determine the distribution of the key values. also it would be helpful to know the cluster ratio of the 2 indexes you referred to.

it would appear that using SECURITY = :HV1 should be most efficient as it is matching on 2 columns on the X1 index (note it has an equal predicate on the first column and would be scanning on the second column). for X2 index it would only have a 1 column match and that would be scanning on BRANCH.

using SPUFI to do any type of performance testing is not reliable.

the best way to test the performance is to generate a file containing the values to be used in your SQL query. the file should contain at least a few thousand records with randomly generated values.

what i usually do is to create a small program which contain both version of the SQL statement that I am testing with a runtime switch which will determine which version of the SQL statement to execute.

then create a batch job with 2 steps both executing the test program with the same run time switch and input file. the create a second job with the same conditions other than the run time switch which should execute the other version of the SQL statement.

then run both jobs on the same LPAR and compare the cpu time of the second steps of the jobs.

the reason to use two steps is that the cpu time is highly dependent on whether the DB2 data is in memory or not. By running the steps twice in the same job, the second step of both jobs should have nearly the same benefit of having the data in memory as the first step would access the same data.

did you run a SELECT COUNT(*) on the two values of the SECURITY to see if the data is skewed ?
_________________
Chuck Haatvedt

email --> clastnameatcharterdotnet

(replace lastname, at, dot with appropriate
characters)

rsantosh · Beginner Joined: 20 Dec 2014 Posts: 47 Topics: 11

Hi Chuck,

Thank you for the detailed info.

Today when I ran EXPLAIN on the query I saw it using same INDEX with both "LIKE" and "=(equal)" unlike the other day.

LIKE :HV1 vs = :HV1

I am puzzled now.

I am trying to see if I can get the EXPLAIN result after masking the data. I understand without the full result it would be difficult for anyone to provide solution.

Regards,
Santosh
_________________
Regards, Santosh

rsantosh · Beginner Joined: 20 Dec 2014 Posts: 47 Topics: 11

Just to add, the table has 80M rows as of today and everyday 20K+ rows get added to it. Would it be better to use REOPT option to get the access determined during run time?

Regards,Santosh
_________________
Regards, Santosh

rsantosh · Beginner Joined: 20 Dec 2014 Posts: 47 Topics: 11

What are the cons of using bind option REOPT(ALWAYS)? Is it advisable to use this option?

Regards,
Santosh
_________________
Regards, Santosh

haatvedt · Posted: Wed Jun 24, 2020 3:58 pm Post subject:

Santosh,

REOPT(ALWAYS) is something that I've seen used when the data is skewed. So far I don't know if you have shown that the data distribution is skewed. If the data is not skewed or you are access the more frequently occurring SECURITY & BRANCH ranges, then the extra overhead used to reoptimize the SQL statement would not result in any saving and would actually increase the cost. Again the way to determine this is to execute the SQL statement many times....

when looking at the explain data, did you make note of the number of matching columns ?

it would be helpful to know the cardinality of the SECURITY and BRANCH columns.

have you written a small batch program to test the performance of different versions of the query ?

remember that EXPLAIN only provides an ESTIMATE of the cost, the best way to test the performance of an SQL statement is to actually EXECUTE it many times to get an average cost. The query below will give you a list of the worst performing combinations of SECURITY / BRANCH combinations. This will highlight issues with skewed data which I suspect is contributing to the performance issues.

execute the following query to generate the test data

SELECT SECURITY, BRANCH, COUNT(*)
FROM TABLE1
GROUP BY SECURITY, BRANCH
ORDER BY 3 DESC
FETCH FIRST 500 ROWS ONLY
WITH UR;

then execute your test program in batch with the output of the query above concatenating the file from above 4 times so it will execute your SQL statement 2,000 times.
_________________
Chuck Haatvedt

email --> clastnameatcharterdotnet

(replace lastname, at, dot with appropriate
characters)