• notice
  • Congratulations on the launch of the Sought Tech site

Tutorial for the storage efficiency of the INNODB table BLOB column in mysql

First of all, I will introduce some key points about the storage format of the InnoDB engine:
1.InnoDB can choose to use a shared tablespace or an independent tablespace.It is recommended to use an independent tablespace for easy management and maintenance.Enable the innodb_file_per_table option, online dynamic modification can take effect after 5.5, and execute ALTER TABLE xx ENGINE=InnoDB to convert the existing table into an independent tablespace, versions earlier than 5.5, after modifying this option, need to restart to take effect;
2.The default data page of InnoDB is 16KB.After version 5.6, the new option innodb_page_size can be modified.Before version 5.6, you can only modify the source code and recompile, but it is not recommended to modify this configuration unless you are very clear about its advantages and disadvantages.;
3.When new data is written to the data page of InnoDB, 1/16 of the space will be reserved, and the reserved space can be used for subsequent new record writing, reducing the overhead of frequently adding new data pages;
4.Each data page needs to store at least 2 rows of records.So theoretically the maximum length of a row record is 8KB, but in fact it should be smaller because there are still some InnoDB internal data structures to store;
5.Limited by the InnoDB storage method, if the data is written sequentially, ideally, the filling rate of the data page is 15/16, but generally there is no way to ensure complete sequential writing.Therefore, the filling of the data page The rate is generally 1/2 to 15/16.Therefore, each InnoDB table should preferably have an auto-increment column as the primary key, so that the writing of new records is as sequential as possible;
6.When the data page fill rate is less than 1/2, InnoDB will shrink to release free space;
7.The 5.6 version of the InnoDB engine currently supports four formats: COMPACT, REDUNDANT, DYNAMIC, and COMPRESSED.The default is COMPACT format.COMPRESSED is rarely used and is not recommended (see the next article).TokuDB engine;
8.Compared with REDUNDANT, COMPACT row format can save about 20% of storage space.Compared with COMPACT, COMPRESSED can save about 50% of storage space, but it will cause TPS to drop by 90%.The COMPRESSED line format is therefore strongly deprecated;
9.When the row format is DYNAMIC or COMPRESSED, long columns such as TEXT/BLOB (long column, may also be other long columns, not necessarily only TEXT/BLOB type, depending on the specific situation) will be completely stored in an independent In the data page, only 20-byte pointers are used in the clustered index page to point to the new page.This is the so-called off-page.Similar to ORACLE row migration, the disk space is wasted seriously, and the I/O performance is also poor.Therefore, the use of BLOB, TEXT, and VARCHAR column types with a length of more than 255 is strongly discouraged;
10.When InnoDB's file format (innodb_file_format) is set to Antelope and the row format is COMPACT or REDUNDANT, BLOB, TEXT or long VARCHAR columns will only store the first 768 bytes in the clustered index page (maximum 768 bytes).The role is to facilitate the creation of prefix index/prefix index), and the rest of the content is stored in an additional page, even if it is only one more byte.Therefore, all columns should be as short as possible;
11.The pages of BLOB, TEXT or long VARCHAR columns stored in off-page are exclusive and cannot be shared.Therefore using multiple long columns in a table is strongly discouraged.

In summary, if you really need to store BLOB, TEXT, and long VARCHAR columns in InnoDB tables in actual business, there are the following suggestions:
1.After serializing and compressing all data as much as possible, store it in the same column to avoid multiple off-pages;
2.Columns whose actual maximum storage length is less than 255 are converted to VARCHAR or CHAR type (if it is variable-length data, there is no difference between the two, if it is fixed-length data, use CHAR type);
3.If it is not possible to integrate all the columns into one column, you can take the next step.Arrange and combine them according to the maximum length of each column and then split them into multiple sub-tables.As far as possible, the total row length of each sub-table is less than 8KB to reduce the occurrence of off-page frequency;
4.The above suggestions are based on the premise that the data page is the default 16KB.If it is modified to 8KB or other size, please test according to the above theory and find the most suitable value;
5.When the length of the character column is less than 255, no matter whether it is stored in CHAR or VARCHAR, or the length of the VARCHAR column is defined as 255, the actual table space will not increase;
6.Generally, more BLOB column types are used in the game field, and peers in the game industry can pay attention.

The following is the test verification process, and students who are patient can watch it slowly:

After each of the above test tables has written 5000 rows of records, then compare the file size of the table space and the size of the reorganized table space to observe the fragmentation rate.For a detailed comparison, see below:

201558110346220.png (1191×133)

In the last table splitting method, the sum of the tablespace file sizes of the five subtables is 40960 + 40960 + 98304 + 286720 + 40960=507904 bytes.
It can be seen that the difference between the total size of this method and the original table size is the smallest, and the other storage methods are larger than this.


Technical otaku

Sought technology together

Related Topic


Leave a Reply