AI Observability app

  • Latest Dynatrace
  • App
  • 4-min read

AI Observability AI Observability provides end‑to‑end visibility for AI workloads, across services, LLMs, agents, and protocols.

  • Out‑of‑the‑box analytics
  • Auto‑instrumentation
  • Targeted metrics
  • Debugging flow
  • Supports 20+ technologies: OpenAI, Amazon Bedrock, Google Gemini, Google Vertex, Anthropic, LangChain, and more.
  • Ready‑made dashboards

Prerequisites

To use AI Observability AI Observability, you need:

Query and sampling cost for AI Observability dashboards

Some out-of-the-box AI Observability dashboards use span queries, which consume Traces powered by Grail - Query. This is true even if AI Observability isn’t fully configured yet, or the dashboards show no data.

To control your trace consumption, you can:

  • Use the sampling variable on these dashboards (where available) to reduce the number of spans queried.
  • Restrict access to exploratory dashboards only for relevant users.
  • Prefer metrics-based tiles and views when possible.

Note that we're currently working on reducing costs for both AI Observability AI Observability and Dashboards, by moving away from span queries.

Get started

AI Observability AI Observability has an integrated onboarding flow that guides you through all the required steps to get started and start ingesting data.

You can get data from:

  • OpenTelemetry.
  • Open source auto-instrumentation libraries like OpenLLMetry.
  • Dynatrace OneAgent.
  • Directly pulling the data from cloud providers through cloud monitoring.

Additionally, you can instrument your AI applications and services directly using OpenTelemetry with GenAI semantic conventions for full control and standardized observability across your entire stack.

Concepts

Here's how the different tabs in AI Observability AI Observability work, and what you'll use them for. The tabs are: Overview, Service Health, and Explorer.

For information about GenAI concepts in Dynatrace, see Terms and concepts about AI Observability and GenAI in Dynatrace.

Overview tab

The Overview tab is your starting point to:

  • Discover AI workloads.
  • Quickly validate data ingestion
  • See a high‑level summary of health, performance, and costs across your AI services.

In this tab, you can:

  • Use the tiles to view your AI landscape at a glance. See model providers, agents, model versions, and services, plus activity such as LLM requests, token usage, and cost trends.

  • Select any tile to open the Service Health tab and drill down with deeper analysis. You can validate errors, review traffic and latency, monitor token and cost behavior, and observe guardrail outcomes.

  • Open ready‑made dashboards for popular AI services or select Browse all dashboards to find dashboards tagged with [AI Observability]. Dashboards include navigation that redirects back into the app for contextual analysis.

Service Health tab

Service Health lets you get a unified view of the operational state of your AI services. It is organized into focused tabs, so you can move from a high-level pulse to root cause in a couple of clicks.

In this tab, you can:

  • Analyze all services, or quickly filter by service category or other predefined attributes.

  • See counts for services, models, and agents.

  • See model requests, token usage, average request duration, and overall cost.

  • Track errors with information such as success/failure rate, number of problems, counts and rate over time.

  • Monitor traffic and latency, and create alerts for regressions. Create alerts for latency regressions.

  • Analyze costs related to token usage, identify cost hot spots, and set proactive cost alerts.

  • Observe provider-reported guardrail outcomes.

    Dynatrace does not enforce runtime guardrails. Providers expose these signals, which we capture and visualize.

    Configure guardrails at the provider level for lowest latency and complexity.

Explorer tab

The Explorer tab is the shared Dynatrace interface for monitoring and analyzing different technology domains. It defines a common layout with consistent filtering, perspectives, drill‑down navigation, and unified analysis.

  • Get insights into you AI workloads, sliced by provider, model, service name, or agent.
  • Inspect detailed insights into specific AI workload services, such as prompts, logs, or problems.

Use cases

  • Understand AI architectures and dependencies across services, agents, and models with contextual health, performance, and cost views.
  • Detect and troubleshoot problems (latency, errors, bottlenecks) in logs and traces, with deep drill‑downs on prompts/traces via the Distributed Tracing Distributed Tracing Explorer view.
  • Monitor token consumption, caching efficiency, and guardrail outcomes to balance quality, cost, and speed.
  • Set proactive alerts for spikes in performance, cost, and quality, explain data with Dynatrace Intelligence, and drive workflows and notifications.
  • A/B testing and model versioning.
  • Data governance and audit trails.
  • Get visibility into your Kubernetes workloads where your AI service is running.

For more AI Observability use cases, see Sample use cases for AI Observability and Dynatrace.

Create and manage alerts

To create a new alert, select New alert on metrics-based tiles. (These tiles include, for example, Invocation error count, Invocation latency, Token count, Token usage forecast, and Overall guardrail activation.) The alert wizard opens up, and is pre‑filled with the current scope so you can fine‑tune thresholds and notifications.

To manage alerts, use the Manage all alerts action from any tab.

  • You can review, edit, and mute custom alerts created from Service Health cards and charts.

  • You can also create a new alert directly from most tiles.

For info about all custom alerts, capabilities, and limits, see Anomaly Detection - new Anomaly Detection.

Debug prompts and traces

AI Observability AI Observability includes integrations with Distributed Tracing Distributed Tracing, and traces are with GenAI fields. The trace list is pre-scoped and laid out so that only the relevant requests appear, and that GenAI context is front and center for faster investigation.

To view traces related to many of the AI Observability AI Observability tiles and interactions:

  1. Go to the Service Health tab.
  2. Select View traces and prompts. View pre-scoped trace lists pre-scoped with GenAI fields that include only the relevant requests, with GenAI context front and center.
  3. Distributed Tracing Distributed Tracing opens with your current filters and timeframe, showing you relevant GenAI information about the trace, like the provider, model, service/endpoint, and agent.

Monitor agent health and performance

  • Detect bottlenecks by tracking real-time metrics, including request counts, durations, and error rates.
  • Manage service costs with automated cost calculations for each request.
  • Stay on track with SLOs and pro-active alerting.

End-to-end prompt tracing and debugging

  • Achieve complete visibility of prompt flows, from initial request to final response, for faster root cause analysis.
  • Capture detailed debug data to troubleshoot issues in complex pipelines.
  • Streamline your workflows with granular tracing of LLM prompts, including response latency and model-level metrics.
  • Resolve issues quicker by pinpointing exact problem areas in prompts, tokens, or system integrations.

Build trust while reducing compliance and audit risks

  • Track every input and output for an audit trail.
  • All data can be queried in real-time and stored for future reference.
  • Maintain full data lineage from initial prompt to response output.

What's coming next?

  • AI model and AI services explorer. Richer details and list views with integrated logs, vulnerabilities, and a new prompt view for detailed root-cause analysis.
  • Prompt management. A dedicated prompt overview to inspect prompts/completions, compare versions, and analyze token/cost usage for faster troubleshooting and optimization.
Related tags
AI ObservabilityAI ObservabilityAI Observability