• notice
  • Congratulations on the launch of the Sought Tech site

Understand Mysql - Summary of Indexing and Optimization

Written in front: Indexes have a crucial impact on the speed of queries, and understanding indexes is also the starting point for database performance tuning.Consider the following situation, suppose a table in the database has 10^6 records, the page size of the DBMS is 4K, and 100 records are stored.If there is no index, the query will scan the entire table.In the worst case, if all data pages are not in memory, 10^4 pages need to be read.If these 10^4 pages are randomly distributed on the disk, it needs to be read.10^4 I/Os, assuming that the disk I/O time is 10ms each time (ignoring the data transmission time), it will take 100s in total (but actually much better).If a B-Tree index is created for it, only log100(10^6)=3 page reads are required, which takes 30ms in the worst case.This is the effect brought by the index.In many cases, when your application is slow to perform SQL queries, you should think about whether you can build an index.Enter the topic:

Chapter Two, Indexing and Optimization

1.Select the data type of the index

MySQL supports many data types.Choosing an appropriate data type to store data has a great impact on performance.Generally speaking, the following guidelines can be followed:

(1) Smaller data types are usually better: Smaller data types usually require less space in the disk, memory, and CPU cache, and are faster to process.

(2) Simple data types are better: Integer data has less processing overhead than characters, because the comparison of strings is more complicated.In MySQL, you should use the built-in date and time data types instead of strings to store time; and use integer data types to store IP addresses.

(3) Try to avoid NULL: you should specify the column as NOT NULL, unless you want to store NULL.In MySQL, columns with null values ​​are difficult to optimize for queries, because they make indexes, index statistics, and comparison operations more complicated.You should replace the null value with 0, a special value, or an empty string.

1.1.Selection identifier

It is very important to choose the right identifier.When choosing, you should not only consider the storage type, but also how MySQL performs operations and comparisons.Once the data type is selected, it should be ensured that all related tables use the same data type.

(1)    Integer: Usually the best choice as an identifier, because it can be processed faster, and it can be set to AUTO_INCREMENT.

(2)    Strings: Try to avoid using strings as identifiers.They consume better space and are slower to process.Moreover, generally speaking, strings are random, so their positions in the index are also random, which will cause page splits, random access to the disk, and clustered index splits (for storage engines that use clustered indexes).

2.Getting started with indexing

For any DBMS, index is the most important factor for optimization.For a small amount of data, the impact of not having a suitable index is not great, but as the amount of data increases, the performance will drop sharply.

If multiple columns are indexed (combined index), the order of the columns is very important, and MySQL can only efficiently search for the leftmost prefix of the index.For example:
Assuming there is a composite index it1c1c2(c1,c2), the query sentence select * from t1 where c1=1 and c2=2 can use this index.The query statement select * from t1 where c1=1 can also use this index.However, the query statement select * from t1 where c2=2 cannot use the index because there is no leading column of the composite index, that is, if you want to use the c2 column for search, c1 must be equal to a certain value.

2.1.Type of index

The index is implemented in the storage engine, not in the server layer.Therefore, the indexes of each storage engine are not necessarily the same, and not all storage engines support all index types.

2.1.1 B-Tree index

Suppose there is a table as follows:

CREATE TABLE People (

  last_name varchar(50) not null,

  first_name varchar(50) not null,

  dob date not null,

  gender enum('m','f') not null,

  key(last_name, first_name, dob)

);

The index contains the last_name, first_name, and dob columns of each row in the table.Its structure is roughly as follows:

 

 The values ​​stored in the index are arranged in the order in the index column.You can use the B-Tree index to query the full keyword, keyword range, and keyword prefix.Of course, if you want to use the index, you must make sure to query according to the leftmost prefix of the index.

(1) Match the full value: Specify specific values ​​for all columns in the index.For example, the index in the figure above can help you find Cuba Allen who was born on 1960-01-01.

(2) Match a leftmost prefix: You can use the index to find the person whose last name is Allen, using only the first column in the index.

(3) Match a column prefix: For example, you can use the index to find people whose last name starts with J, which only uses the first column in the index.

(4) Match a range of values: You can use the index to find people whose last name is between Allen and Barrymore, using only the first column in the index.

(5) Match one part exactly and match a range on another part (Match one part exactly and match a range on another part): You can use the index to find people whose last name is Allen and whose first name starts with the letter K.

(6) Index-only queries: If the columns to be queried are all in the index, there is no need to read the value of the tuple.

Because the nodes in the B-tree are stored sequentially, you can use the index to search (find certain values), and you can also ORDER BY the query results.Of course, the use of B-tree index has the following restrictions:

(1) The query must start from the leftmost column of the index.This point has been mentioned many times.For example, you cannot use the index to find people who were born on a certain day.

(2) An index column cannot be skipped.For example, you cannot use the index to find people whose last name is Smith and who were born on a certain day.

(3) The storage engine cannot use the column to the right of the range condition in the index.For example, if your query is WHERE last_name="Smith" AND first_name LIKE'J%' AND dob='1976-12-23', the query will only use the first two columns in the index, because LIKE is a range query.

2.1.2, Hash index

In MySQL, only Memory storage engine shows that it supports hash index, which is the default index type of Memory table, although Memory table can also use B-Tree index.The Memory storage engine supports non-unique hash indexes, which are rare in the database field.If multiple values ​​have the same hash code, the index saves their row pointers to the same hash table entry in a linked list.

Assume that a table is created as follows:

CREATE TABLE testhash (
  fname VARCHAR(50) NOT NULL,
  lname VARCHAR(50) NOT NULL,
  KEY USING HASH(fname)
) ENGINE=MEMORY;

The data included are as follows:

Assuming that the index uses the hash function f( ), as follows:

f('Arjen') = 2323
f('Baron') = 7437
f('Peter') = 8784
f('Vadim') = 2458

At this time, the structure of the index is roughly as follows:

 Slots are in order, but the records are not in order.When you execute

mysql> SELECT lname FROM testhash WHERE fname='Peter' ;

MySQL will calculate the hash value of'Peter', and then use it to query the indexed row pointer.Because f('Peter') = 8784, MySQL will look up 8784 in the index and get a pointer to record 3.

Because the index itself only stores very short values, the index is very compact.The hash value does not depend on the data type of the column.The index of a TINYINT column is as large as the index of a long string column.

Hash index has the following limitations:

(1) Since the index only contains the hash code and record pointer, MySQL cannot avoid reading records by using the index.But access to the records in the memory is very fast and will not have much impact on sex.

(2) The hash index cannot be used for sorting.

(3) Hash index does not support partial matching of keys, because the hash value is calculated through the entire index value.

(4) Hash index only supports equivalence comparison, for example, use =, IN() and <=>.For WHERE price>100 does not speed up the query.

2.1.3, spatial (R-Tree) index

MyISAM supports spatial indexing, which is mainly used for geospatial data types, such as GEOMETRY.

2.1.4, Full-text index

Full-text index is a special index type of MyISAM, mainly used for full-text search.

3.High-performance indexing strategy

3.1, Clustered Indexes

The clustered index ensures that the physical locations of tuples with similar key values ​​are also the same (so the string type is not suitable to establish a clustered index, especially random strings, which will cause the system to perform a large number of mobile operations), and a The table can only have one clustered index.Because the index is implemented by the storage engine, not all engines support clustered indexes.Currently, only solidDB and InnoDB support.

The structure of the clustered index is roughly as follows:

 

 Note: Leaf pages contain complete tuples, while inner node pages only contain indexed columns (indexed columns are integers).Some DBMSs allow users to specify clustered indexes, but MySQL's storage engine does not support it so far.InnoDB creates a clustered index on the primary key.If you do not specify a primary key, InnoDB will use an index with a unique and non-null value instead.If such an index does not exist, InnoDB will define a hidden primary key and then build a clustered index on it.Generally speaking, DBMS will store actual data in the form of clustered index, which is the basis of other secondary indexes.

3.1.1 Comparison of the data layout of InnoDB and MyISAM

In order to better understand clustered index and non-clustered index, or primary index and second index (MyISAM does not support clustered index), let's compare the data layout of InnoDB and MyISAM.For the following table:

CREATE TABLE layout_test (

  col1 int NOT NULL,

  col2 int NOT NULL,

  PRIMARY KEY(col1),

  KEY(col2)

);

 Assuming that the value of the primary key is between 1 and 10,000, and inserted in random order, then use OPTIMIZE TABLE for optimization.col2 is randomly assigned a value between 1 and 100, so there will be many duplicate values.

(1)    MyISAM data layout

The layout is very simple, MyISAM stores data on the disk in the order of insertion, as follows:

 Note: On the left is the row number, starting from 0.Because the size of the tuple is fixed, MyISAM can easily find the position of a byte from the beginning of the table.

The index structure of the primary key established according to these is roughly as follows:

 Note: MyISAM does not support clustered indexes.Each leaf node in the index only contains a row number, and the leaf nodes are stored in the order of col1.
Take a look at the index structure of col2:

Actually, in MyISAM, the primary key is no different from other indexes.Primary key is just a unique, non-empty index called PRIMARY.

(2)    InnoDB data layout

InnoDB stores data in the form of a clustered index, so its data layout is very different.The structure of its storage table is roughly as follows:

 Note: Each leaf node in the clustered index contains the value of the primary key, transaction ID and rollback pointer-used for transactions and MVCC, and the remaining columns (such as col2).

Compared with MyISAM, secondary indexes are very different from clustered indexes.The leaf of InnoDB's secondary index contains the value of the primary key instead of row pointers.This reduces the overhead of maintaining the secondary index when data is moved or data pages are split, because InnoDB does not need to update the index's row pointer.Its structure is roughly as follows:

 Comparison of clustered index and non-clustered index table:

 

3.1.2, insert rows in the order of primary key (InnoDB)

If you use InnoDB and do not need a special clustered index, a good practice is to use a surrogate key-independent of the data in your application.The easiest way is to use an AUTO_INCREMENT column, which will ensure that the records are inserted in order, and can improve the performance of the query that uses the primary key to connect.Should try to avoid random clustering of primary keys, for example, string primary key is a bad choice, it makes insert operations become random.

  3.2、Covering Indexes

If the index contains all the data that satisfies the query, it is called a covering index.Covering index is a very powerful tool that can greatly improve query performance.Only need to read the index without reading the data has the following advantages:

(1) Index items are usually smaller than records, so MySQL accesses less data;

(2) Indexes are stored in the order of value, which requires less I/O compared to random access records;

(3) Most data engines can better cache indexes.For example, MyISAM only caches indexes.

(4) Covering indexes are especially useful for InnoDB tables, because InnoDB uses clustered indexes to organize data.If the secondary index contains the data required for the query, it is no longer necessary to look up in the clustered index.

The covering index cannot be any index, only the B-TREE index stores the corresponding value.And different storage engines implement covering indexes in different ways, and not all storage engines support covering indexes (Memory and Falcon do not).

For index-covered queries, when using EXPLAIN, you can see "Using index" in the Extra column.For example, in the inventory table of Sakila, there is a composite index (store_id, film_id).For queries that only need to access these two columns, MySQL can use the index, as follows:

mysql> EXPLAIN SELECT store_id, film_id FROM sakila.inventory\ G

*************************** 1.row ******************** *******

      id: 1

 select_type: SIMPLE

    table: inventory

     type: index

possible_keys: NULL

     key: idx_store_id_film_id

   key_len: 3

     ref: NULL

     rows: 5007

    Extra: Using index

1 row in set (0.17 sec)

 In most engines, the index will only cover when the column accessed by the query is part of the index.However, InnoDB is not limited to this, InnoDB's secondary index stores the value of the primary key in the leaf node.Therefore, the sakila.actor table uses InnoDB, and there is an index on last_name, so the index can cover those queries that access actor_id, such as:

mysql> EXPLAIN SELECT actor_id, last_name

  -> FROM sakila.actor WHERE last_name ='HOPPER'\G

*************************** 1.row ******************** *******

      id: 1

 select_type: SIMPLE

    table: actor

     type: ref

possible_keys: idx_actor_last_name

     key: idx_actor_last_name

   key_len: 137

     ref: const

     rows: 2

    Extra: Using where; Using index

3.3, use index to sort

In MySQL, there are two ways to generate an ordered result set: one is to use filesort, and the other is to scan in index order.Sorting operations using indexes are very fast, and the same index can be used for searching and sorting operations at the same time.When the order of the index is the same as the order of the columns in the ORDER BY and all the columns are in the same direction (all ascending or all descending), you can use the index to sort.If the query is to join multiple tables, the index will only be used when all the columns in the ORDER BY are the columns of the first table.Filesort will be used in other cases.

create table actor(

actor_id int unsigned NOT NULL AUTO_INCREMENT,

name varchar(16) NOT NULL DEFAULT'',

password varchar(16) NOT NULL DEFAULT'',

PRIMARY KEY(actor_id),

 KEY (name)

) ENGINE=InnoDB

insert into actor(name,password) values('cat01','1234567');

insert into actor(name,password) values('cat02','1234567');

insert into actor(name,password) values('ddddd','1234567');

insert into actor(name,password) values('aaaaa','1234567');
 

mysql> explain select actor_id from actor order by actor_id \G

*************************** 1.row ******************** *******

      id: 1

 select_type: SIMPLE

    table: actor

     type: index

possible_keys: NULL

     key: PRIMARY

   key_len: 4

     ref: NULL

     rows: 4

    Extra: Using index

1 row in set (0.00 sec)

 

mysql> explain select actor_id from actor order by password \G

*************************** 1.row ******************** *******

      id: 1

 select_type: SIMPLE

    table: actor

     type: ALL

possible_keys: NULL

     key: NULL

   key_len: NULL

     ref: NULL

     rows: 4

    Extra: Using filesort

1 row in set (0.00 sec)

 

mysql> explain select actor_id from actor order by name \G

*************************** 1.row ******************** *******

      id: 1

 select_type: SIMPLE

    table: actor

     type: index

possible_keys: NULL

     key: name

   key_len: 18

     ref: NULL

     rows: 4

    Extra: Using index

 
1 row in set (0.00 sec)

When MySQL cannot use the index for sorting, it will use its own sorting algorithm (quick sorting algorithm) to sort the data in the memory (sort buffer).If the memory cannot be loaded, it will sort the data on the disk.Block, then sort each data block, and then merge each block into an ordered result set (in fact, it is an outer sort).For filesort, MySQL has two sorting algorithms.

(1) Two passes scanning algorithm (Two passes)

The implementation method is to first take out the fields to be sorted and the pointer information that can be directly located to the relevant row data, and then sort in the set memory (set by the parameter sort_buffer_size), and pass the row pointer information again after the sorting is completed Take out the required Columns.

Note: This algorithm is the algorithm used before 4.1, it needs to access the data twice, especially the second read operation will cause a lot of random I/O operations.On the other hand, the memory overhead is small.

(3)    One scan algorithm (single pass)

This algorithm takes out all the required Columns at one time, and outputs the result directly after sorting in the memory.

Note: This algorithm has been used since MySQL version 4.1.It reduces the number of I/Os and is more efficient, but the memory overhead is also larger.If we take out the Columns that are not needed, it will greatly waste the memory required for the sorting process.In versions after MySQL 4.1, you can control whether MySQL chooses the first sorting algorithm or the second by setting the max_length_for_sort_data parameter.When the total size of all the large fields taken out is greater than the setting of max_length_for_sort_data, MySQL will choose to use the first sorting algorithm, otherwise, it will choose the second.In order to improve the sorting performance as much as possible, we naturally prefer to use the second sorting algorithm, so it is very necessary to extract only the required Columns from the Query.

When sorting the join operation, if the ORDER BY only refers to the columns of the first table, MySQL performs a filesort operation on the table and then performs the join processing.At this time, EXPLAIN outputs "Using filesort"; otherwise, MySQL must Generate a temporary table from the result set of the query, and perform the filesort operation after the connection is completed.At this time, EXPLAIN outputs "Using temporary; Using filesort".

3.4.Index and Locking

Index is very important for InnoDB, because it allows queries to lock fewer tuples.This is very important, because in MySQL 5.0, InnoDB will not be unlocked until the transaction is committed.There are two reasons: First, even if the overhead of InnoDB row-level locks is very efficient, the memory overhead is also small, but no matter what, there is still overhead.Secondly, the locking of unneeded tuples will increase the overhead of the lock and reduce the concurrency.

InnoDB only locks the tuples that need to be accessed, and the index can reduce the number of tuples that InnoDB accesses.However, this goal can only be achieved by filtering out those unwanted data at the storage engine layer.Once the index does not allow InnoDB to do that (that is, it fails to achieve the purpose of filtering), the MySQL server can only perform WHERE operations on the data returned by InnoDB.At this time, it is unavoidable to lock those tuples: InnoDB has locked those elements The group, the server cannot be unlocked anymore.
Let's see an example:

create table actor(

actor_id int unsigned NOT NULL AUTO_INCREMENT,

name varchar(16) NOT NULL DEFAULT'',

password varchar(16) NOT NULL DEFAULT'',

PRIMARY KEY(actor_id),

 KEY (name)

) ENGINE=InnoDB

insert into actor(name,password) values('cat01','1234567');

insert into actor(name,password) values('cat02','1234567');

insert into actor(name,password) values('ddddd','1234567');

insert into actor(name,password) values('aaaaa','1234567');
SET AUTOCOMMIT=0;

BEGIN;

SELECT actor_id FROM actor WHERE actor_id < 4

AND actor_id <> 1 FOR UPDATE;

The query only returns 2---3 data, and actually has exclusive locks on 1---3 data.InnoDB locks tuple 1 because MySQL's query plan only uses indexes for range queries (without filtering, the second condition in WHERE can no longer use indexes):

mysql> EXPLAIN SELECT actor_id FROM test.actor

  -> WHERE actor_id < 4 AND actor_id <> 1 FOR UPDATE \G

*************************** 1.row ******************** *******

      id: 1

 select_type: SIMPLE

    table: actor

     type: index

possible_keys: PRIMARY

     key: PRIMARY

   key_len: 4

     ref: NULL

     rows: 4

    Extra: Using where; Using index

1 row in set (0.00 sec)

mysql>

Indicates that the storage engine starts at the beginning of the index and fetches all rows until actor_id<4 is false, and the server cannot tell InnoDB to remove tuple 1.
In order to prove that row 1 has been locked, we create another connection and perform the following operations:

SET AUTOCOMMIT=0;

BEGIN;

SELECT actor_id FROM actor WHERE actor_id = 1 FOR UPDATE;

The query will be suspended and will not be executed until the first connected transaction commits and releases the lock (this behavior is necessary for statement-based replication).

As shown above, when using an index, InnoDB will lock the tuples it does not need.To make matters worse, if the query cannot use the index, MySQL will perform a full table scan and lock each tuple, regardless of whether it is really needed.

Tags

Technical otaku

Sought technology together

Related Topic

0 Comments

Leave a Reply

+