• notice
  • Congratulations on the launch of the Sought Tech site

Case of analyzing a MySQL exception query

Questions

User ticket question: The same statement, but the last limit line number is different.Strangely, the performance with limit 10 is about 10 times slower than the statement with limit 100.

Hide user table information, statement and results are as follows

Execution time 3 min 3.65 sec

Execution time 1.24Sec.

The performance gap is huge!

Analysis
Tips: The most common way to trace the execution of a statement is to look at the execution plan of the statement through explain.?

The more impactful effect is that after narrowing the scope, under this data, the execution plans of limit 67 and limit 68 are very different.

Two execution plans:

As you can see, the execution plans of the two statements are different: the indexes used are different.

MySQL Tips: In the explain result, key indicates the index to be used eventually, and rows indicates the number of rows to be scanned to use this index.This is an estimate.

In the table, index A is defined as (f3, f4, f1, f2, f5); index B is defined as (f1, f2, f3);

A confirmation

Although rows is an estimate, it is the basis for guiding index usage.Since limit 68 can reach rows 67586, it means that this value should also be included in the optional result of the first statement optimizer.Why not select index A?
First confirm our conclusion above.

MySQL Tips: Force index can be used in MySQL syntax to force the optimizer to use a certain index.

By the way, since we specified the force index, the optimizer will not consider other indexes, and only A will be displayed in possible_keys.Our focus is on rows:67586.This shows that in the limit 67 statement, using index A can also reduce row scans.

MySQL Tips: The MySQL optimizer will calculate the query cost for each possible index in possible_key and choose the query plan with the least cost.

At this point, we can probably guess that this should be a bug in the MySQL implementation: the appropriate index was not selected, resulting in the use of an obviously wrong execution plan.

MySQL Tips: The MySQL optimizer needs to rely on the statistics of the table during execution, and the statistics are estimated values, so the execution plan obtained may be non-optimal.

But it should be noted that the above Tip is caused by objective conditions (acceptable), but this example is an exception, so the optimizer can actually get the data (rows value) that can make the correct result of the selection, but the final choice Mistake.

Cause Analysis

The MySQL optimizer determines the index to use based on an estimate of the query cost.The process of calculating this estimated value is basically determined by "estimating the number of rows to be scanned".

MySQL Tips: MySQL can only use prefix indexes in versions 5.1 and 5.5 currently used by the mainstream group.

Therefore, only field f3 can be used with index A, and only field f1 can be used with index B.Rows is the number of data rows (estimated value) that need to be scanned after using the index to find the upper and lower bounds.

The above statement needs to use group and order by, so there are Using temporary; Using filesort in the execution plan.
In the process, the query cost using index A is calculated first in order.

Then calculate the query cost of other possitabe_keys in turn.Since sorting is required in the process, after a tentative result is obtained, it is necessary to judge whether there is a less expensive sorting method (test_if_cheaper_ordering).
Similar to the previous one, the cost is also calculated by estimating the number of scanned lines.

In the implementation of this logic, there is a bug: the prefix index is not considered when estimating the discrimination of the current index.

That is: Assuming that there are 50w rows of data in the table and the index B (f1, f2, f3), when calculating the index discrimination, it needs to be determined according to the prefix part that can be used.For example, if f1 has 1000 different values, the average number of records on each key value is 500.If (f1, f2) has 10,000 same values, the average number of records on each combined key is 50.If (f1, f2, f3) have 50w different values, then the average number of records on each combined key is 1.

MySQL Tips: The fewer records on each key, the more efficient it is to use this index to query.Corresponds to the larger the Cardinality value in the output of show index from tbl.

In this case, index B can only use f1 as a prefix index, but (f1, f2, f3) is used when calculating the row average on a single key, which leads to the estimation of index B when estimating , the cost is low.lead to wrong selection.

Back to the question itself

1.Why did you choose the right one when the limit value was large?
This is because when calculating the query cost of B, the number of rows that the query needs to return, limit_rows, also participates in the product.If the limit value is larger, the calculated cost of B will be larger, but it will be due to the cost.value exceeds A, causing the optimizer to finally choose A.

2.There are only 50w rows in this table, why is there such a big difference between the limits?
This has to do with the statement itself.There is group by in this statement, which means that each additional limit value actually needs to scan more rows N.Here N is "total number of rows in table"/"different f2 values ​​in table".
That is to say, this statement makes the bug have a magnifying effect.

Solution

After the analysis is clear, the solution is relatively simple.Modify the code logic and use the discrimination of the field f1 to calculate it during the execution of test_if_cheaper_ordering.

Tags

Technical otaku

Sought technology together

Related Topic

0 Comments

Leave a Reply

+