• notice
  • Congratulations on the launch of the Sought Tech site

The query is increased by 200 times, ClickHouse is worth having!

1. What is ClickHouse?

ClickHouse: is a columnar database management system (DBMS) for online analytics (OLAP)

Let's first clarify some basic concepts

  • OLTP: It is a traditional relational database. It mainly operates additions, deletions, changes, and searches, emphasizing transaction consistency, such as banking systems and e-commerce systems.

  • OLAP: It is a warehouse-type database, mainly for reading data, doing complex data analysis, focusing on technical decision support, and providing intuitive and simple results

Next, let's use a diagram to understand the difference between a column database and a row database

In traditional row-based database systems (MySQL, Postgres, and MS SQL Server), data is stored in the following order:

12a58913dfc75f72c8e286d805441f42.png

In a columnar database system (ClickHouse), data is stored in the following order:
fb0957aceb1d09b9768817003e1d83c9.png

The comparison between the two in terms of storage methods:

44373bcebb1303210df67e7d719b23c2.png

The above is the basic introduction of ClickHouse, more can be found

Official Manual: https://clickhouse.tech/docs/en/

2. Business problems

The business side is currently stored in Mysql, a large table with a data volume of 50 million and two auxiliary tables. The query cost of a single join table is 3min+, and the execution efficiency is extremely low. After index optimization, horizontal table division, and logic optimization, the results were low, so I decided to use ClickHouse to solve this problem

Finally, through optimization, the query time is reduced to within 1s, and the query efficiency is increased by 200 times!

I hope that through this article, you can quickly master this weapon and avoid detours in practice.

3. ClickHouse Practice

1. Clickhouse installation under Mac

I installed via docker:

Check out the tutorial: https://blog.csdn.net/qq_24993831/article/details/103715194

You can also download CK to compile and install, which is relatively troublesome.

2. Data Migration: From Mysql to ClickHouse

ClickHouse supports most syntaxes of Mysql, and the migration cost is low. There are currently five migration schemes:

  • create table engine mysql, the mapping scheme data is still in Mysql

  • insert into select from, build a table first, then import

  • create table as select from, create table and import at the same time

  • csv offline import

  • streamsets

Choose the third option for data migration:

CREATE TABLE [IF NOT EXISTS] [db.]table_name ENGINE = Mergetree AS SELECT *
FROM mysql('host:port', 'db', 'database', 'user', 'password')
3. Performance test comparison
Types ofThe amount of datatable sizequery speed
Mysql5000000010G205s
ClickHouse50000000600MBwithin 1s
4. Data synchronization scheme

Temporary tables

a02d463f91bea1dbe1e0ce4c535fcb43.png

Create a new temp intermediate table, synchronize the full amount of Mysql data to the temp table in ClickHouse, and then replace the table in the original ClickHouse, which is suitable for scenarios with moderate data volume, frequent increments and variables

synch

d9ee1aa551c8da77722ffc998587566d.png

Recommended open source synchronization software:

synch: https://github.com/long2ice/synch/blob/dev/README-en.md

The principle is to obtain the sql statement through the binlog log of Mysql, and then consume the task through the message queue

5. Why is ClickHouse fast?
  • Only the column data to be calculated needs to be read, instead of the entire row data read in row format, reducing IO cost

  • The same type of the same column, with ten times compression improvement, further reducing IO

  • Clickhouse makes personalized search algorithms according to different storage scenarios

Fourth, the pit encountered

1. Differences between ClickHouse and mysql data types

Query with Mysql statement and find an error:

bcfc619549f618b176e3758c6d421516.png

Solution : LEFT JOIN B b ON toUInt32(h.id) =
toUInt32(ec.post_id), transfer it, and unify the unsigned type association

2. Delete or update is performed asynchronously, only final consistency is guaranteed

Querying the CK manual found that even Mergetree, which supports the best data consistency, only guarantees eventual consistency:

6fed46dfe0ae1fb78079d79c267822c4.png

If the requirements for data consistency are high, it is recommended that you do full synchronization to solve the problem.


V. Summary

Through the practice of ClickHouse, the bottleneck of Mysql query has been solved perfectly. 90% of the data-level queries below 2 billion rows can get the results within 1s. As the amount of data increases, ClickHouse also supports clusters. If you are interested, you can actively try

References:

  • ClickHouse Official Manual

    • https://clickhouse.tech/docs/en/)

  • ClickHouse in Ctrip Hotel Application

    • https://cloud.tencent.com/developer/article/1462633

  • How to choose ClickHouse engine

    • https://developer.aliyun.com/article/762461


Tags

Technical otaku

Sought technology together

Related Topic

1 Comments

author

purchase atorvastatin pills & lt;a href="https://lipiws.top/"& gt;lipitor online order& lt;/a& gt; buy lipitor 20mg generic

Ezjpkr

2024-03-07

Leave a Reply

+