What is a data lakehouse?

  • Concept
  • 4-min read
  • Published Jun 17, 2025

The Grail data lakehouse at the heart of the Dynatrace platform enables contextual analytics across unified observability, security, and business data. As a data lakehouse, Grail combines the cost efficiency advantages of data lakes with the analytics capabilities of data warehouses, and adds extreme performance through massively parallel processing.

Data lake

Enterprises collect exponentially growing volumes of data with the aim of extracting value and having all relevant data available when needed. Data lakes are a cost efficient solution in this regard, as they consolidate raw data in a flat data format on cheap storage. Keeping data in a central place enables data analytics teams to access it when required, including use cases such as machine learning or historical analysis.

Grail incorporates the following benefits of data lakes:

  • All (raw) data in one place
  • All data available if needed
  • Minimized costs
  • Elasticity
  • Governance embedded
  • Fully managed cloud infrastructure

Grail avoids the following limitations of data lakes:

  • High effort to extract information: it can take days, even month, to extract information for complex questions.
  • Slow access: in a process known as hydration, raw data needs to be loaded first into other tools before it can be analyzed, and potentially also moved from cold to hot storage.
  • High complexity: data analysts have to extract value from data, as raw data is stored in heterogenous format and requires preprocessing.
  • Regulatory and privacy difficulties: Coarse data protection and inability to delete individual data records make it very difficult to meet the demands of regulations and privacy certifications.

Data warehouse

Data warehouses, unlike data lakes, are structured databases that store data according to a predefined schema and data model, making data access and analytics easier.

Data lakehouse

A data lakehouse merges the benefits of data lakes and data warehouses. It allows storing large volumes of structured and unstructured data while providing fast and scalable analytical and processing capabilities, supporting the full flexibility of schema-on-read. This approach makes it possible to manage data more flexibly and efficiently, facilitating advanced analytics and machine learning on a scalable, cost-effective platform.

Data lakehouses have the following attributes:

  • Cost-effective storage with unlimited scalability, leveraging cloud object storage. Organizations don't have to decide which data to store, retaining the full advantage of raw data.
  • Unified and scalable data storage for all types of data, thus breaking down silos.
  • The flexibility of data lakes, such as storing ingested data in unstructured format or schema-on-read.
  • Separation of storage and compute, which makes it possible to independently scale resources, thus optimizing costs and resource allocation.
  • Flexible handling of data formats: ingest data without overhead or schema requirement and gain agility and adaptability. Unlike data warehouses, which rely on predefined schemas, a data lakehouse supports schema-on-read.

Data warehouse, data lake, and Grail data lakehouse compared

Grail is a data lakehouse that specializes in observability use cases. It leverages the scalability and flexibility of data lakes and adds the transactional layer of data lakehouses to give meaning to raw data signals, creating a semantic model that supports Smartscape and provides the contextual layer that enables Davis AI.

Grail data lakehouse

Data warehouse

Data lake

Data type support

Structured and unstructured data in its raw format

Structured data

Structured and unstructured data in its raw format

Scalability

Scales to any size at low cost

Scale-up is exponentially complex and expensive

Scales to any size, requiring extra tooling

Flexibility

Full flexibility with additional schema on read

Limited to schema on ingest

Full flexibility without any schema, requires rehydration and separate processing

Performance

High: Structured data allows to quickly access data and gain insights.

High: For indexed data access patterns

Poor

Ease of use

Simple: Structured data allows to quickly access data and gain insights

Simple: Structured data allows to quickly access data and gain insights.

Difficult: Missing structure and large volume make it difficult to find data.

Security

Grail adds a sophisticated permissioning model and additional concepts such as hard data deletion as well as data encryption at rest and in transit and others.

Good security concepts including encryption, auditing, authorization controls.

Storing data in its raw format, risk to become a data swamp, less efforts for governance and access control topics.

Cost

$

$$$$

$

Related tags
Dynatrace Platform