• notice
  • Congratulations on the launch of the Sought Tech site

Use OR, IN with Unionall in Mysql comparison under the query command

Which is the faster query efficiency of OR, in, and union all?
Many voices on the Internet say that union all is faster than or and in because or and in will cause a full table scan.They gave many examples.
But really union all is really faster than or, in?

EXPLAIN SELECT * from employees where employees.first_NAME='Georgi' UNION ALL SELECT * from employees where employees.first_NAME='Bezalel'

481 results of the execution of this statement, the execution time was 0.35s

PRIMARY employees ALL 300141 Using where

UNION employees ALL 300141 Using where

 UNION RESULT <union1,2> ALL

explain SELECT * FROM employees WHERE employees.first_name IN ('Georgi','Bezalel')

The execution result time of this statement is 0.186s

SIMPLE employees ALL 300141 Using where

explain SELECT * FROM employees WHERE employees.first_name='Georgi' or employees.first_name='Bezalel'

The execution result of this statement is similar to the result of in

Is the online statement wrong? Is it related to the index? Created an index on firstname

Re-execute

Union's execution plan is as follows, the execution time is 0.004s

PRIMARY employees ref index_firstname index_firstname 44 const 253 Using where
UNION employees ref index_firstname index_firstname 44 const 228 Using where
UNION RESULT <union1,2> ALL

The execution plan of in is as follows, and the execution time is also 0.004s

SIMPLE employees range index_firstname index_firstname 44 481 Using where

Or's execution plan is as follows, and the execution time is also 0.004s

SIMPLE employees range index_firstname index_firstname 44 481 Using where

It feels similar in performance.But pay attention to the type in the execution plan, ref is better than range (ref is non-unique index scan, range is index range scan)
Suddenly it felt like it was almost the same as what was said on the Internet, but if the first sentence took two ref scans, would it be less efficient than a range scan?

Should I try the primary key again, this is the only one, will it be consistent with the online effect?

EXPLAIN SELECT * FROM employees WHERE employees.EMP_NO=100001 UNION ALL SELECT * FROM employees WHERE employees.EMP_NO=101100

Union's execution plan is as follows

PRIMARY employees const PRIMARY PRIMARY 4 const 1
UNION employees const PRIMARY PRIMARY 4 const 1
UNION RESULT <union1,2> ALL

EXPLAIN SELECT * FROM employees WHERE employees.EMP_NO IN (100001 ,101100)

The execution plan of in is as follows

SIMPLE employees range PRIMARY PRIMARY 4 2 Using where

EXPLAIN SELECT * FROM employees WHERE employees.EMP_NO=100001 OR emp_no=101100

or's execution plan is as follows

SIMPLE employees range PRIMARY PRIMARY 4 2 Using where

I feel that the result is still similar to the second experiment.


The following article uses examples to discuss the efficiency comparison between them under the actual query command.
1: Create a table, insert data, and the amount of data is 10 million [or the effect is not obvious].

drop table if EXISTS BT;
create table BT(
  ID int(10) NOT NUll,
  VName varchar(20) DEFAULT'' NOT NULL,
  PRIMARY key(ID)
)ENGINE=INNODB;

 The table has only two field IDs as the primary key [index page is similar], and one is a common field.(Be lazy, just use a simple table structure)
Insert 10 million pieces of data into the BT table
Here I wrote a simple stored procedure [so your mysql version is at least greater than 5.0, my version is 5.1 】,code show as below.
Note: best

 INSERT INTO BT (ID,VNAME) VALUES(i, CONCAT('M', i));---1

    amended to

 INSERT INTO BT (ID,VNAME) VALUES(i, CONCAT('M', i,'TT'));---2

   The reason for the modification is in
   Non-indexed columns and VNAME use union for full scan, please use 1.
   If a full table scan is used for non-indexed columns and VNAME, please use 2.
 

DROP PROCEDURE IF EXISTS test_proc;
CREATE PROCEDURE test_proc()
BEGIN
declare i int default 0;
set autocommit=0;
while i<10000000 do
INSERT INTO BT (ID,VNAME) VALUES(i, CONCAT('M', i));
set i=i+1;
if i%2000=0 then
commit;
end if;
end while;
END;

 No comments, it's pretty simple.
The storage process is best to set the relevant parameters of innob [mainly related to the log and write cache so that it can speed up the insertion].I did not set it to insert 10 million pieces of data and inserted it for 6 minutes.
Part of the data is as follows: 10 million data is similar

2: Actual combat
    2.1: Use or, in, union all on the index column respectively
           The table we created has only a primary key index, so we can only query by ID.We check the three data with IDs 98, 85220 and 9888589 each time-consuming as follows:

The time is 0.00.How could this happen? All queries are in milliseconds.
I use another tool--EMS SQL Manager  for mysql
The query display time is
93 ms, 94 ms, 93 ms, the difference in time is almost negligible.
Then we are looking at their respective execution plans

The field type and ref field to be noted here
We found that the type used by union all [type is to show what type of connection is used] is ref and or and in are range [ref connection type is better than range, and there is not much difference], and the number of query rows is the same [see rows field All are 3].
From the point of view of the whole process, there is not much difference between using constant or and in in the index column and union all query.
But why in some complex queries, the use of or and in for index columns is much slower than union all.This may be because your query is not written reasonably enough to let mysql give up the index and perform a full table scan.
2.2: Use or, in, and union all in non-indexed columns.
    We check that the VNAME is M98, M85220, and M9888589.The three data each takes time as follows:

We found that the query time of union all is almost three times that of or and in.
This is why, let's not talk about it, let's take a look at the three query plans.

Here we find that the plan is almost the same.
But we should pay attention to scanning at this time for or and in, only scan the table once, that is, rows are listed as 9664482.
For union all, the table is scanned three times, that is, the sum of rows is 9664482*3.
This is why we see that union all is almost tripled.
Note: If you use a stored procedure to use the second sql, all types of the execution plan are listed as all.In fact, this is what I want to demonstrate most, but now I have almost finished writing it and found that the problem would be wrong.

3: Summary
     3.1: Don’t be superstitious that union all is faster than or and in, and analyze which situation to use based on the actual situation.
     3.2: It is best to use union all for index columns, because complex queries [including operations, etc.] will make or, in abandon the index and scan the full table, unless you can be sure that or, in will use index.
    3.3: For only non-indexed fields, you can honestly use or or in, because non-indexed fields should be full table scans and union all only doubles the number of table scans.
    3.4: For and indexed fields [indexed fields are valid] and include non-indexed fields, logically you can also use or, in or union all,
       But I recommend using or, in.
      such as the following query:

select * from bt where bt.VName='M98' or bt.id='9888589'
 
select * from bt where bt.VName='M98'
UNION ALL
select * from bt where bt.id='9888589'

     The difference between the two query speeds mainly depends on the query time of the index column.If the query time of the index column is too long, then you can use or or in instead.
   3.5: The above is mainly for single table, but for multi-table joint query, there are more places to consider, such as connection method, query table data distribution, index, etc., combined with single table strategy selection Appropriate keywords. 

Tags

Technical otaku

Sought technology together

Related Topic

0 Comments

Leave a Reply

+