• notice
  • Congratulations on the launch of the Sought Tech site

MySQL advanced article SQL optimization (optimize database objects)

1 Overview

In the database design process, users may often encounter this kind of problem: Should all tables be designed in accordance with the third normal form? What is the appropriate length for the fields in the table? Although these problems are small, improper design may bring many performance problems to future applications. This chapter will introduce some optimization methods of database objects in MySQL.Some of these methods are not only applicable to MySQL, but also applicable to other types of database management systems.

2.Optimize the data type of the table

The table needs to use any data type, it needs to be judged according to the application. Although the application design needs to consider the length of the field to leave a certain degree of redundancy, it is not recommended to leave many fields with a lot of redundancy, which will waste disk storage space and waste physical memory during application operations. In MySQL, you can use the function PROCEDURE ANALYSE() to analyze the currently applied table.This function can make optimization suggestions for the data types listed in the data table.The user can consider whether to implement optimization according to the actual situation of the application. The following is how to use the function PROCEDURE ANALYSE():


But when I execute this statement on MySQL version 8.0 or above, an error is reported.Whether the 8.0 version does not support it or the syntax is changed, I won't go into it for the time being, and I will check the official website information when I have time.

3.Improve table access efficiency by splitting

The "split" mentioned here refers to the splitting of the data table. If it is performed for a MyISAM type table, there are two split methods.
● The first method is vertical split, that is, put the primary key and some columns in one table, and then put the primary key and other columns in another table.
Advantages: If some columns in a table are commonly used, but some other columns are not commonly used, you can use vertical splitting.In addition, vertical splitting can make data rows smaller, and one data page can store more data.When querying It will reduce the number of I/Os.
Disadvantages: Need to manage redundant columns, and query all data requires a joint (JOIN) operation.
● The second method is horizontal splitting, which puts the data rows into two independent tables based on the value of one or more columns of data. Horizontal splitting is usually used in the following situations:
◎The table is very large.After splitting, the number of data and index pages that need to be read during query can be reduced, and the number of index layers can also be reduced to improve query speed.
◎ The data in the table is inherently independent.For example, the table records the data of each region or the data of different periods, especially some data is commonly used, while other data is not commonly used.
◎Data needs to be stored on multiple media. For example, the billing table of a mobile phone can be divided into two or more tables. The billing data of the last 3 months is stored in one table, and the historical bills of 3 months ago are stored in another table.The historical bills of more than 1 year can be stored on a separate storage medium.This split is the most commonly used Horizontal split method.
Note: Horizontal splitting will add complexity to the application system.It usually requires multiple table names when querying, and querying all data requires UNION operation. In many database applications, this complexity will exceed the advantages it brings, because as long as the index key is not large, when the index is used for query, the amount of data in the table will increase by 2 to 3 times, and the query will also increase the number of reads.The number of disks in an index layer, so the horizontal split should consider the growth rate of the data volume, and decide whether the table needs to be split horizontally according to the actual situation.


●It is very clear to everyone that the database design should meet the standardization, but is the higher the degree of standardization of the data, the better? This is still determined by actual needs. Because the higher the normalization, the more relationships are generated.The direct result of too many relationships is that the connection operations between the tables are more frequent, and the connection operations between the tables are low-performance operations, which directly affect the speed of the query Therefore, for applications with more queries, it is necessary to use denormalization to design the data according to the actual situation, and improve the performance of the query through denormalization.
For example, the inventory list in the backend of a general e-commerce platform will contain information such as the name of the merchant, which is convenient for management personnel to view.ID and other attributes are stored in an inventory information table B, then when we want to display information such as the merchant name in the inventory list, we must connect the tables in the database, because the inventory information table B does not contain the merchant information table A's business name and other information, so the data must be retrieved through the associated table A.If this is taken into account in the database design, a redundant field can be added to the B table to store the business name data, so that when querying the inventory list There is no need to do table association anymore, which can also make the query have better performance.
● The advantage of anti-normative is to reduce the demand for connection operations, reduce the number of foreign keys and indexes, and possibly reduce the number of tables.The corresponding problem is that there may be data integrity problems. Speed up the query, but will reduce the speed of modification. Therefore, when deciding to de-norm, we must weigh the pros and cons, and carefully analyze the data access requirements and actual performance characteristics of the application.Good indexes and other methods can often solve performance problems, instead of using de-norm.
● Before performing de-normative operations, fully consider the data access requirements, the size of commonly used tables, some special calculations (such as totals), and the physical storage location of the data. Commonly used anti-normative techniques include adding redundant columns, adding derived columns, regrouping tables, and splitting tables.
◎Add redundant columns: refers to the same columns in multiple tables, it is often used to avoid join operations when querying.
◎Add derived columns: refers to the added columns come from data in other tables, and are generated by calculations from data in other tables. The effect of the added derived column is to reduce the join operation during query and avoid the use of set functions.
◎Regroup tables: If many users need to view the result data of the connection of two tables, they will regroup the two tables into one table to reduce the connection and improve performance.
◎Split table: You can refer to the content in Section 3 "Improving Table Access Efficiency by Splitting".
● In addition, the reverse specification technology needs to maintain the integrity of the data. No matter what kind of anti-normative technology is used, a certain amount of management is required to maintain the integrity of the data.Commonly used methods are batch maintenance, application logic, and triggers.
◎Batch maintenance means that after a certain amount of time is accumulated for the modification of the copied column or the derived column, a batch of processing jobs or stored procedures are run to modify the copied or derived column.This can only be used when the real-time requirements are not high..
◎Data integrity can also be achieved by application logic, which requires that all tables involved must be added, deleted, and modified in the same transaction. The use of application logic to achieve data integrity is risky, because the same logic must be used and maintained in all applications, and it is easy to miss, especially when requirements change, and it is not easy to maintain.
◎Another way is to use triggers.Any modification to the data immediately triggers the corresponding modification to the replicated column or derived column. The trigger is real-time, and the corresponding processing logic only appears in one place, which is easy to maintain. Generally speaking, it is a better way to solve such problems.

5.Use intermediate tables to improve statistical query speed

For tables with a large amount of data, performing statistical queries on them is usually very inefficient, and it is also necessary to consider whether statistical queries will have a negative impact on online applications. Usually in this case, using the intermediate table can improve the efficiency of statistical query.The following introduces the use of the intermediate table through the statistics of the goods_stock inventory table.The goods_stock table records the daily inventory import records.The table structure is as follows:

DESC goods_stock;

Because a large number of inventory import records are generated every day, the amount of data in the goods_stock table is very large, resulting in slow paging query of inventory import data:

now the business department has a specific demand: I want to know the total inventory in the last 30 days. In response to this demand, we use two methods to get the results that the business department wants.
Method 1: Perform statistics directly on the goods_stock table to get the desired result:

SELECT SUM(StockQty) AS TotalStockQty FROM goods_stock WHERE CreateTime>ADDDATE(NOW(),- 30 );

Method 2: Create an intermediate table tmp_goods_stock, the table structure is exactly the same as the source table structure:

Copy code

CREATE TABLE `tmp_goods_stock` (
 ID` the NOT NULL AUTO_INCREMENT bigint `the COMMENT ' supplier inventory ID ' ,
 StockGUID` VARCHAR `( 32 ) the CHARACTER the SET utf8mb4 the COLLATE utf8mb4_0900_ai_ci the NOT NULL the DEFAULT '' the COMMENT ' for the corresponding sub-table ' ,
 Model` VARCHAR `( 50 ) the CHARACTER the SET utf8mb4 the COLLATE utf8mb4_0900_ai_ci the NOT NULL the DEFAULT '' the COMMENT ' model ' ,
 Brand` VARCHAR `( 50 ) CHARACTER utf8mb4 the SET COLLATE utf8mb4_0900_ai_ci the DEFAULT NULL '' the COMMENT ' brand, the standard name ' ,
 `LotNO` varchar( 20 ) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL DEFAULT '' COMMENT ' batch, usually four-digit year + two-digit week, such as: 202024 ' ,
 MinPackageQty` ` int the NOT NULL the DEFAULT 0 the COMMENT ' Minimum Pack ' ,
 PackageUnit` VARCHAR `( 20 is ) the CHARACTER the SET utf8mb4 the COLLATE utf8mb4_0900_ai_ci the NOT NULL the DEFAULT '' the COMMENT ' packaging unit, the packaging type ' ,
 StockQty` ` int the NOT NULL the DEFAULT 0 the COMMENT ' stocks, number of units: the PCS ' ,
 MinOrderQty` ` int the NOT NULL the DEFAULT 0 the COMMENT ' MOQ ' ,
 IncQty` ` int the NOT NULL the DEFAULT 0 the COMMENT ' incremental ' ,
 The NOT NULL StockType` the COMMENT smallint The ` ' Material Type 10: Supplier Supplier Stock 20 30 Cloud cartridge single row spot ' ,
 `CreateTime` datetime( 0 ) NOT NULL DEFAULT CURRENT_TIMESTAMP( 0 ),
 `UpdateTime` datetime( 0 ) NOT NULL DEFAULT CURRENT_TIMESTAMP( 0 ),
 PRIMARY KEY (`ID`) USING BTREE) ENGINE = InnoDB AUTO_INCREMENT = 424449 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci COMMENT = ' Commodity inventory information, supplier material inventory ' ROW_FORMAT = Dynamic;

Copy code

Transfer the data to be counted to the intermediate table, and then perform statistics on the intermediate table to get the desired result:

INSERT INTO tmp_goods_stock SELECT * FROM goods_stock WHERE CreateTime>ADDDATE(NOW(),- 30 );

SELECT SUM(StockQty) AS TotalStockQty FROM tmp_goods_stock;

From the above two implementation methods, it takes very little time to do statistics in the intermediate table (the time to transfer data is not calculated here).In addition, for the business department, I want to know "I want to know the total inventory in the last 30 days."For this requirement, it is more appropriate to give statistical results on the intermediate table.The reason is that the CreateTime field of the source data table (goods_stock table) is not indexed and the amount of data in the source table is relatively large.Therefore, it is efficient when performing time-based statistics.Very low, then you can create a separate index on the CreateTime field on the intermediate table to improve the speed of statistical queries. Intermediate tables are often used in statistical queries, and their advantages are as follows:
● Intermediate tables copy part of the source table data and are "isolated" from the source table.Statistical queries on the intermediate tables will not have a negative impact on online applications.
●The intermediate table can flexibly add indexes or add new fields for temporary use, so as to improve the efficiency of statistical query and the role of auxiliary statistical query.


This chapter introduces the optimization of database objects.The quality of database object design is the basis of database design, and once the database object is designed and put into use, it will be more troublesome to modify it in the future.Therefore, you must Be as thoughtful as possible.

References: In-
depth understanding of MySQL


Technical otaku

Sought technology together

Related Topic


Leave a Reply