• notice
  • Congratulations on the launch of the Sought Tech site

Fully understand the difference between UTF8 and UTF8MB4 in mysql


MySQL added the encoding of utf8mb4 after 5.5.3.mb4 means most bytes 4, which is specially designed to be compatible with four-byte unicode.Fortunately, utf8mb4 is a superset of utf8, and there is no need to do other conversions except to change the encoding to utf8mb4.Of course, in order to save space, utf8 is usually enough.

Second, content description

The above said that since utf8 can store most Chinese characters, why use utf8mb4? Originally mysql The maximum character length of utf8 encoding supported is 3 bytes.If a 4-byte wide character is encountered, an exception will be inserted.The maximum Unicode character that can be encoded by the three-byte UTF-8 is 0xffff, which is the basic multilingual plane (BMP) in Unicode.In other words, any Unicode characters that are not in the basic multi-text plane cannot be stored in Mysql's utf8 character set.Including Emoji expressions (Emoji is a special Unicode encoding, commonly found on ios and android phones), and many infrequently used Chinese characters, as well as any new Unicode characters and so on.

Three, the source of the problem

The original UTF-8 format uses one to six bytes and can encode up to 31 characters.The latest UTF-8 specification uses only one to four bytes, and can encode up to 21 bits, which just happens to be able to represent all 17 Unicode planes.

utf8 is a character set in Mysql, which only supports UTF-8 characters of up to three bytes, which is the basic multi-text plane in Unicode.

Why does utf8 in Mysql only support UTF-8 characters with a maximum of three bytes? I thought about it, maybe because Mysql just started to develop, there is no auxiliary plane for Unicode.At that time, the Unicode Committee was still dreaming of "65535 characters are enough for the whole world".The length of the string in Mysql is the number of characters rather than the number of bytes.For the CHAR data type, it is necessary to reserve enough length for the string.When using the utf8 character set, the length that needs to be reserved is the longest character length of utf8 multiplied by the string length, so the maximum length of utf8 is naturally limited to 3, for example, CHAR(100) Mysql will retain 300 bytes in length.As for why subsequent versions do not provide support for 4-byte UTF-8 characters, I think one is for backward compatibility considerations, and the other is that characters outside the basic multilingual plane are really rarely used.

To save 4-byte UTF-8 characters in Mysql, you need to use the utf8mb4 character set, but it is only supported after version 5.5.3 (check version: select version();).I think that in order to obtain better compatibility, you should always use utf8mb4 instead of utf8.For CHAR type data, utf8mb4 will consume more space.According to the official Mysql recommendation, use VARCHAR instead of CHAR.

The above article has a comprehensive understanding of the difference between utf8 and utf8mb4 in mysql is all the content shared by the editor Yes, I hope I can give you a reference, and I hope you can support it.


Technical otaku

Sought technology together

Related Topic


Leave a Reply