Dynatrace Grail architecture

Latest Dynatrace
Explanation
5-min read
Published Jun 17, 2025

This article provides a high-level overview of the architecture of Grail.

Overview

Grail is a bespoke solution meticulously created for the intricate demands of handling substantial quantities of immutable observability, business, and security data.

To ensure scalability and performance, we've separated storage from compute, and implemented three different layers:

Ingest and processing
Store
Query processing

Ingest and processing

OpenPipeline is the central input interface for data that is stored in Grail. It provides full control over all raw data—whether it's observability, security, or business data. OpenPipeline supports various ingestion channels, including multiple API endpoints, OneAgent, OpenTelemetry, and extensions.

The ingested data is processed in real time through processing pipelines, addressing privacy and security needs with masking and filtering while at the same time performing contextual enrichment, data normalization, and metrics extraction. Data is transformed and mapped to topology information using Smartscape and Semantic Dictionary before being routed into one of the available target Grail buckets.

Store

Grail is the central data lakehouse for all observability, security, and business data. It manages data storage, organization, structure, compression, privacy, and compliance automatically once data is ingested.

Grail is built to handle large volumes of raw data, leveraging the power of its innovative cloud storage architecture, while eliminating the traditional processes of managing hot/cold storage, rehydration, and the need to export data to external storage solutions and thus reducing costs. Grail features an advanced automatic cold/hot data management system that ensures data remains fully accessible with zero latency, effectively offering always-hydrated data.

Data is always stored in context, without the need for tagging or defining schemas on ingest or at storage. You can't directly access the data; the only way to query data is by performing queries using the query processing layer.

Query processing

Grail provides a single interface to query all kinds of data using the Dynatrace Query Language (DQL), which offers a rich set of commands for fetching, filtering, extracting, joining, and aggregating data, and is optimized for handling heterogenous data.

When working with data warehouses or using query languages such SQL, you need to define schemas to work with data. These schemas are often outdated and inconsistent, and limit the possibilities of how to explore data.

With Grail, you benefit from a schema-on-read approach, which means you define the schema when you need it, during the read process. This allows you to stay fully flexible, handle any data type and format, and query any data.

Query processing in Grail uses Massive Parallel Processing (MPP) and datawarping to optimize and parallelize execution, eliminating the need for indexes or manual hot storage management.

Massively Parallel Processing

Massively Parallel Processing (MPP) is an advanced computing architecture designed for high-performance, large-scale data processing. MPP describes a paradigm where thousands of nodes work in parallel, each processing parts of a computational task.

Even though the data is distributed across multiple nodes for parallel processing, it never leaves the secure environment scope. This ensures that data is never mixed between environments and prevents any impact on other environments.

MPP consists of the following key components:

Node architecture: MPP systems are composed of numerous independent nodes—each as a self-sufficient unit—for performing operations.
Parallel processing: The key feature of MPP is its ability to process data in parallel. Each node works on a separate part of the task simultaneously, dramatically increasing processing speed and efficiency.
Data distribution: Data is distributed across the nodes, ensuring that each node can work on its data segment without interference.
Inter-node communication: Nodes communicate with each other through a high-speed network to coordinate tasks and exchange data.
Scalability: MPP systems are highly scalable. You can add more nodes to the system to increase computing power and storage capacity as needed.
Fault tolerance: Due to the independent nature of the nodes, MPP systems offer excellent fault tolerance. If one node fails, the others can continue processing without disruption.

Grail implements this concept, which in practice means that if a user sends a DQL query, Grail leverages MPP to split the tasks into chunks that can be processed in parallel using all available nodes to process a single query. This ensures high performance for even the toughest queries. The combination of MPP and always-hydrated automatic hot/cold storage enables high performance for getting answers to any question at any time.

Datawarping

Datawarping is a patented high-speed efficient data storage and retrieval technology that is applied to all data types in Grail. It eliminates manual schema management and obsoletes indexes, including 90–99% of the overhead caused by them. Datawarping makes it possible to run any query at any time while executing as cost-effectively as a data lake and as performantly as index-everything.

It suits heterogeneous immutable data, both structured and unstructured, including metrics, traces, logs, sessions, and events at exabyte scale. Its magic is possible by inverting the problem from identifying where the data is located, to using a 250 times faster approach compared to typical technologies where data is not stored.

With datawarping, Grail can search through hundreds of TB of log data per second.