What is a data lakehouse?

Latest Dynatrace
Explanation
4-min read

The Grail data lakehouse at the heart of the Dynatrace platform enables contextual analytics across unified observability, security, and business data. As a data lakehouse, Grail combines the cost efficiency advantages of data lakes with the analytics capabilities of data warehouses, and adds extreme performance through massively parallel processing.

Data lake

Enterprises collect exponentially growing volumes of data with the aim of extracting value and having all relevant data available when needed. Data lakes are a cost efficient solution in this regard, as they consolidate raw data in a flat data format on cheap storage. Keeping data in a central place enables data analytics teams to access it when required, including use cases such as machine learning or historical analysis.

Grail incorporates the following benefits of data lakes:

All (raw) data in one place
All data available if needed
Minimized costs
Elasticity
Governance embedded
Fully managed cloud infrastructure

Grail avoids the following limitations of data lakes:

High effort to extract information: it can take days, even month, to extract information for complex questions.
Slow access: in a process known as hydration, raw data needs to be loaded first into other tools before it can be analyzed, and potentially also moved from cold to hot storage.
High complexity: data analysts have to extract value from data, as raw data is stored in heterogenous format and requires preprocessing.
Regulatory and privacy difficulties: Coarse data protection and inability to delete individual data records make it very difficult to meet the demands of regulations and privacy certifications.

Data warehouse

Data warehouses, unlike data lakes, are structured databases that store data according to a predefined schema and data model, making data access and analytics easier.

Data lakehouse

A data lakehouse merges the benefits of data lakes and data warehouses. It allows storing large volumes of structured and unstructured data while providing fast and scalable analytical and processing capabilities, supporting the full flexibility of schema-on-read. This approach makes it possible to manage data more flexibly and efficiently, facilitating advanced analytics and machine learning on a scalable, cost-effective platform.

Data lakehouses have the following attributes:

Cost-effective storage with unlimited scalability, leveraging cloud object storage. Organizations don't have to decide which data to store, retaining the full advantage of raw data.
Unified and scalable data storage for all types of data, thus breaking down silos.
The flexibility of data lakes, such as storing ingested data in unstructured format or schema-on-read.
Separation of storage and compute, which makes it possible to independently scale resources, thus optimizing costs and resource allocation.
Flexible handling of data formats: ingest data without overhead or schema requirement and gain agility and adaptability. Unlike data warehouses, which rely on predefined schemas, a data lakehouse supports schema-on-read.

Data warehouse, data lake, and Grail data lakehouse compared

Grail is a data lakehouse that specializes in observability use cases. It leverages the scalability and flexibility of data lakes and adds the transactional layer of data lakehouses to give meaning to raw data signals, creating a semantic model that supports Smartscape and provides the contextual layer that empowers Dynatrace Intelligence.

	Grail data lakehouse	Data warehouse	Data lake
Data type support	Structured and unstructured data in its raw format	Structured data	Structured and unstructured data in its raw format
Scalability	Scales to any size at low cost	Scale-up is exponentially complex and expensive	Scales to any size, requiring extra tooling
Flexibility	Full flexibility with additional schema on read	Limited to schema on ingest	Full flexibility without any schema, requires rehydration and separate processing
Performance	High: Structured data allows to quickly access data and gain insights.	High: For indexed data access patterns	Poor
Ease of use	Simple: Structured data allows to quickly access data and gain insights	Simple: Structured data allows to quickly access data and gain insights.	Difficult: Missing structure and large volume make it difficult to find data.
Security	Grail adds a sophisticated permissioning model and additional concepts such as hard data deletion as well as data encryption at rest and in transit and others.	Good security concepts including encryption, auditing, authorization controls.	Storing data in its raw format, risk to become a data swamp, less efforts for governance and access control topics.
Cost	$	$$$$	$

What is a data lakehouse?

Data lake

Data warehouse

Data lakehouse

Data warehouse, data lake, and Grail data lakehouse compared

Related topics