How to design the data warehouse layered structure?
Before I first came into contact with the data warehouse, Xiaobai, I had a question, why is the data warehouse layered? Can't I write all the logic one by one? What is the meaning of ODS and DW in the big guys? After a long time of work, I also have my own understanding, and follow Xiaobai to explore the true meaning of data layering.
First of all, why is the data warehouse layered?
For more complex logic processing, the layering can clearly see the data structure, and each layer has its scope, which can be easily located and understood when the developer uses the table;
Layering is helpful for developers to quickly locate the problem and clear the scope of the problem, which can quickly track the blood relationship of the data;
Standardized layering and development of some general middle-tier data models can reduce the tremendous workload and provide data model reusability;
Data warehouse layering can use space for time to solve a complex problem hierarchically, and solve different problems at each layer of the data warehouse.
The core idea of stratification?
The core idea of layering is to decouple and simplify complex issues, which directly affects how many layers the data architecture is divided into.
Is it okay to divide the data structure into one layer?
The answer is yes.We can perform multi-table processing and fusion in one script, which is also a popular practice in many "small workshops" at present, but this kind of logic will affect the whole body.If one point changes, the entire design may have to be modified.And it is easy for things to follow people.If there are no comments, it will cause developers to not understand the responsible logic when they look back;
Is it possible to layer according to the logic of data processing?
The same is also possible.However, it may cause the chain to be too long and make it more difficult to control the process.If there is data blood, you will find that the relationship chain is too long, which is not conducive to the verification problem.
How many layers should the data architecture be divided into?
The answer is that several layers are fine, but they must fit the actual business.In short, remember that our core idea is decoupling.
How many layers should it be divided into?
There is no fixed standard that says how many layers should be divided, but the classic layering in the industry is ODS->DW->DM or ODS->DWD->DWS-APP, where:
ODS (Operation Data Store), the operation data layer, that is, the original data layer, also called the source layer, is basically the same structure as the business system (may add management fields), the purpose is to preserve the history, decouple the business database, so that the entire data platform You only need to access the business database once.Therefore, the meaning of the ODS layer is to reduce the access pressure to the business database as much as possible.The ODS layer is sometimes subdivided into two layers, an STG data buffer layer, which stores raw data, and an ODS, which stores simple clean data.
DWD (DataWarehouse Detail), the detailed data layer, cleans data, unifies codes, unifies fields, unifies formats, and simply aggregates.The meaning of the DWD layer is to standardize data and provide clean, uniform, and standard data for subsequent processing.
DWS (DataWarehouse Service), the service data layer, in accordance with business goals, horizontally aggregates and vertically aggregates processed data.Data redundancy and pre-calculation are carried out according to the wide table model to exchange space for time
The DM layer, also called the subject layer, is different from the subject domain.It is a small data mart set up specifically for a single business or department on top of an enterprise-level data warehouse.The DM layer can be split again according to business needs
In addition, there may be DIM (dimension), label layer, etc.;
Data architecture example
to sum up
The core logic of data warehouse layering is decoupling.If the company's business is relatively simple and relatively fixed, then we can use a three-tier structure.If the company’s data processing logic is responsible, and there may be changes, then we can decoupling on the basis of the three layers.If the company has major business adjustments, you can add another layer, mainly considering the limited time.Under the conditions of, resources, etc., take into account the rapid changes in business, and make a balance and trade-off between the two.
As for the data warehouse structure should be divided into several layers, it is recommended to carry out a coherent structure and layered design according to the current business and construction status.Generally, it is just started, and 3 and 4 layers are recommended.Plan a 1-1.5-year structure, and then continue to build, optimize, and re-optimize.Constantly approaching to meet all needs.