Skip to content

From Data Lake to Data Warehouse

With the advent of some new terms like Delta Lake and Lake House I thought I would put together a brief summary for those who are pondering what these model/storage approaches are.

Please note that some examples are less contemporary than others, for example the Corporate Information Factory is effectively now replaced by the Data Vault.

Summary

NameDescription
Corporate Information FactoryCombines sources and transforms it into a repository in the integration layer. It is highly normalised.
Operational Data StoreIt is intended to integrate real-time updates with master and transactional data for use by operational reports. It is normalised.
Data LakeEnables storage of structured and unstructured data at scale. Data is stored without transformation. Despite this modelling approach, users can access Data Lakes to deliver analytics. The data is raw.
Data LakehouseCombines the capabilities of data lakes and data warehouses, by enabling BI and ML.
Delta LakeA layer that adds reliability to data lakes. It provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.
Data MartDesigned for reporting current and historical data at a line of business level. It is often a subset of the data warehouse and is de-normalised
Data VaultSet of normalised tables that support one or more functional areas of business. There are currently two versions, referred to as Business and Raw Vaults. It is highly normalised
Data WarehouseDesigned for reporting current and historical data at an enterprise level. It is de- normalised.
Modelling and storage options

Use

You can see that there are many options, and those above are only a subset, also in many cases that refer to each other (e.g. a Data Mart is a subset of a Data Warehouse) so lets look at how they are often combined:

Conclusion

If you are asked “which one is best”, please don’t get stuck in a fundamentalist position, they all have their merits depending on the architecture (cloud or on premise) data, budget and requirements you are presented with

Just as importantly though, consider that the model/storage approaches needs to be complimented by the tools and the team you have. For example I promise you that hand coding a Data Vault (i.e. sooner than using meta data driven ETL) with a team that has never built one will not be an easy experience!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

Close Bitnami banner
Bitnami