The query is increased by 200 times, ClickHouse is worth having!
1. What is ClickHouse?
ClickHouse: is a columnar database management system (DBMS) for online analytics (OLAP)
Let's first clarify some basic concepts
OLTP: It is a traditional relational database. It mainly operates additions, deletions, changes, and searches, emphasizing transaction consistency, such as banking systems and e-commerce systems.
OLAP: It is a warehouse-type database, mainly for reading data, doing complex data analysis, focusing on technical decision support, and providing intuitive and simple results
Next, let's use a diagram to understand the difference between a column database and a row database
In traditional row-based database systems (MySQL, Postgres, and MS SQL Server), data is stored in the following order:
In a columnar database system (ClickHouse), data is stored in the following order:
The comparison between the two in terms of storage methods:
The above is the basic introduction of ClickHouse, more can be found
Official Manual: https://clickhouse.tech/docs/en/
2. Business problems
The business side is currently stored in Mysql, a large table with a data volume of 50 million and two auxiliary tables. The query cost of a single join table is 3min+, and the execution efficiency is extremely low. After index optimization, horizontal table division, and logic optimization, the results were low, so I decided to use ClickHouse to solve this problem
Finally, through optimization, the query time is reduced to within 1s, and the query efficiency is increased by 200 times!
I hope that through this article, you can quickly master this weapon and avoid detours in practice.
3. ClickHouse Practice
1. Clickhouse installation under Mac
I installed via docker:
Check out the tutorial: https://blog.csdn.net/qq_24993831/article/details/103715194
You can also download CK to compile and install, which is relatively troublesome.
2. Data Migration: From Mysql to ClickHouse
ClickHouse supports most syntaxes of Mysql, and the migration cost is low. There are currently five migration schemes:
create table engine mysql, the mapping scheme data is still in Mysql
insert into select from, build a table first, then import
create table as select from, create table and import at the same time
csv offline import
streamsets
Choose the third option for data migration:
CREATE TABLE [IF NOT EXISTS] [db.]table_name ENGINE = Mergetree AS SELECT * FROM mysql('host:port', 'db', 'database', 'user', 'password')
3. Performance test comparison
Types of | The amount of data | table size | query speed |
---|---|---|---|
Mysql | 50000000 | 10G | 205s |
ClickHouse | 50000000 | 600MB | within 1s |
4. Data synchronization scheme
Temporary tables
Create a new temp intermediate table, synchronize the full amount of Mysql data to the temp table in ClickHouse, and then replace the table in the original ClickHouse, which is suitable for scenarios with moderate data volume, frequent increments and variables
synch
Recommended open source synchronization software:
synch: https://github.com/long2ice/synch/blob/dev/README-en.md
The principle is to obtain the sql statement through the binlog log of Mysql, and then consume the task through the message queue
5. Why is ClickHouse fast?
Only the column data to be calculated needs to be read, instead of the entire row data read in row format, reducing IO cost
The same type of the same column, with ten times compression improvement, further reducing IO
Clickhouse makes personalized search algorithms according to different storage scenarios
Fourth, the pit encountered
1. Differences between ClickHouse and mysql data types
Query with Mysql statement and find an error:
Solution : LEFT JOIN B b ON toUInt32(h.id) =
toUInt32(ec.post_id), transfer it, and unify the unsigned type association
2. Delete or update is performed asynchronously, only final consistency is guaranteed
Querying the CK manual found that even Mergetree, which supports the best data consistency, only guarantees eventual consistency:
If the requirements for data consistency are high, it is recommended that you do full synchronization to solve the problem.
V. Summary
Through the practice of ClickHouse, the bottleneck of Mysql query has been solved perfectly. 90% of the data-level queries below 2 billion rows can get the results within 1s. As the amount of data increases, ClickHouse also supports clusters. If you are interested, you can actively try
References:
ClickHouse Official Manual
https://clickhouse.tech/docs/en/)
ClickHouse in Ctrip Hotel Application
https://cloud.tencent.com/developer/article/1462633
How to choose ClickHouse engine
https://developer.aliyun.com/article/762461
0 Comments