AI/ML Observability

AI Observability is the practice of monitoring, analyzing, and visualizing the internal states, inputs, and outputs of artificial intelligence (AI) models that are embedded and used within modern applications. The goal of AI Observability is to gain insights and understand the behavior, performance, and cost of AI models to ensure their correctness, reliability, and effectiveness. Moreover, the result of AI Observability enables organizations to report their usability of AI for compliance purposes.

By observing the AI system’s behavior, data scientists, engineers, and operators can gain valuable insights and make informed decisions to improve and optimize the system’s performance.

AI Observability is crucial because AI systems are often complex and operate in dynamic and unpredictable environments, or are themselves dynamic, unpredictable, and probabilistic. According to various national AI regulation drafts, such as the European Union Artificial Intelligence Act, observability and transparency in AI are important to detect biases, understand limitations, and identify potential issues or anomalies.

Explore in Dynatrace Hub

Get a holistic view of the AI-generated parts of your system such as LLM, vector databases, and prompt engineering frameworks to gain comprehensive insights.

Dynatrace end-to-end AI/ML Observability

As AI systems grow in complexity, a holistic approach to the observability of AI-powered applications becomes even more crucial. Bringing together metrics, logs, traces, problem analytics and root cause information in dashboards and notebooks, Dynatrace offers a unified operational view of AI-powered cloud applications end-to-end.

Full AI/ML Observability with Dynatrace

This approach allows Dynatrace to observe the complete AI stack of modern applications, from foundational models and vector database metrics and orchestration frameworks covering modern Retrieval Augmented Generation (RAG) architectures to provide visibility into the entire lifecycle of modern applications across various layers:

Infrastructure – Utilization, saturation, errors
Models – Accuracy, precision/recall, explainability
Semantic Caches and Vector Databases – Volume, distribution
Orchestration – Performance, versions, degradation
Application Health – Availability, latency, reliability

Observing infrastructure

To help companies build more sustainable products, Dynatrace seamlessly integrates with cloud services and custom models such as Amazon Elastic Inference, Google Tensor Processing Unit, and NVIDIA GPU, enabling monitoring of infrastructure data, including temperature, memory utilization, and process usage to ultimately support carbon-reduction initiatives.

Observing models

Running AI models at scale can be resource-intensive. Model observability provides visibility into resource consumption and operation costs, aiding in optimization and ensuring the most efficient use of available resources.

Integrations with cloud services and custom models like OpenAI, Amazon Translate, Amazon Textract, Azure Computer Vision, and Azure Custom Vision Prediction provide a robust framework for model monitoring. For production models, this provides observability for service-level performance (SLA) metrics such as token consumption, latency, availability, response time, and error count.

Observing semantic caches and vector databases

The RAG framework has proven to be a cost-effective and easy-to-implement approach to enhancing the performance of LLM-powered apps by feeding LLMs with contextually relevant information, eliminating the need to constantly retrain and update models while mitigating the risk of hallucination.

However, RAG is not perfect and raises various challenges, particularly concerning the use of vector databases and semantic caches. To address the challenge of considering both the retrieval and the generation aspects, Dynatrace provides monitoring capabilities to semantic caches and vector databases like Milvus, Weaviate, and Qdrant.

Observing orchestration frameworks

The knowledge of LLMs and other models is limited to the data they were trained on. Building AI applications that can reason about private data or data introduced after a model’s cutoff date requires augmenting the knowledge of the model with the specific information it needs via prompt engineering and retrieval-augmented generation.

Orchestration frameworks such as LangChain provide application developers with several components designed to help build RAG applications more generally, starting with providing a pipeline for ingesting data from external data sources and indexing it.

Use Dynatrace in combination with Traceloop OpenLLMetry to gain insights into popular RAG orchestration frameworks such as LangChain.

Key metrics for AI Observability

In the context of AI observability, monitoring and measuring the aspects listed below are crucial for maintaining the performance, reliability, and efficiency of AI systems. This involves using various capabilities of the Dynatrace platform, such as real-user behavior monitoring, end-to-end tracing, log monitoring, as well as metric anomaly and root-cause detection to gain insights into the behavior of the AI models and their associated infrastructure.

Stability: Measures the stability of a given model. How often a model was able to successfully return a result in relation to those cases where the model failed to deliver.
Latency: Measure the latency of a model when fulfilling a request. How long does it take a generative AI service to return a result, or how long does it take a route optimizing model to return the requested route.
Load: Measure how much load the model handles and check for abnormal load spikes or drops to detect outages or abnormal usage patterns.
Model drift: Refers to the degradation of model performance due to changes in input data and its relationships to the output. It can be measured in different ways. For example, a classical measure for a prediction model is the delta of predicted value to the true expected value.
Data drift: For models where the true output value becomes available only with a large latency, it may be required to measure the stationarity of the input data as a surrogate measure for potential model drift.
Cost: Measure the cost of the model. Examples here are to measure the number of tokens in the prompt and return of a GPT large language model and to multiply it by the token costs of the used external SaaS service.

Observing AI models and services is inherently domain-driven, as only the creator of a model can expose critical states, logs, and measurements for effective monitoring.

Key Dynatrace Platform capabilities to enable AI Observability

Monitoring: Continuous monitoring of AI models and services is essential to collect and analyze relevant data during their operation. This includes monitoring the input data, internal states, and output predictions or decisions made by the model. By tracking these aspects, any issues, errors, or unexpected behaviors can be identified in real-time.

While the general performance and stability of the service is covered automatically by Dynatrace, the owner of the AI model is responsible to identify key operational indicators and to expose those to the monitoring platform in terms of custom metrics, events, and logs.
Logging: Logging involves capturing and recording relevant events, errors, and activities of the AI system. It helps in understanding the sequence of actions and provides a detailed record of what occurred during the system's operation. This information can be useful for debugging, performance analysis, and post-mortem analysis.
Metrics and Performance Analysis: Defining and tracking metrics related to the AI system's performance is crucial for observability. These can include accuracy, precision, recall, latency, throughput, or any other essential metric type. Analyzing these metrics with Dynatrace dashboards and notebooks over time can help identify patterns, trends, and performance degradation.
Visualization: Build domain-specific dashboards to visualize the behavior and performance of AI systems to better understand their operation. Visualizations can include charts, graphs, dashboards, or other visual representations of the system's inputs, outputs, and internal states. These visualizations enable stakeholders to quickly identify patterns, anomalies, or issues.
Anomaly Detection: Setting up alerting mechanisms and anomaly detection systems is important to proactively identify and respond to potential issues in AI systems. Alerts can be triggered based on predefined thresholds, unexpected behaviors, or deviations from expected patterns. This enables timely intervention and troubleshooting.
Explainability and Interpretability: AI models often operate as black boxes, making it difficult to understand their decision-making process. Observability aims to enhance explainability and interpretability by providing insights into the factors influencing the model's outputs. Techniques like model interpretability, feature importance analysis, or visualization of intermediate representations can help in understanding the reasoning behind AI model decisions.

By implementing AI observability practices, organizations can improve the reliability, trustworthiness, and performance of their AI systems. It enables proactive monitoring, debugging, and optimization, leading to more robust and responsible AI deployments.

Dynatrace’s own AI Observability: A case study of customer zero implementation

Dynatrace, a software intelligence company, has implemented its own AI observability solution to monitor, analyze, and visualize the internal states, inputs, and outputs of its AI models. By doing so, Dynatrace has become a customer zero for its own AI observability capability.

The company’s AI observability solution provides valuable insights into the behavior, performance, and cost of AI models, ensuring their correctness, reliability, and effectiveness. Dynatrace’s AI observability solution is domain-driven, as only the creator of a model can expose critical states, logs, and measurements for effective monitoring.

The solution collects and analyzes relevant data during the operation of AI models and services, including monitoring the input data, internal states, and output predictions or decisions made by the model. The solution also provides essential metric categories for AI observability, such as stability, latency, load, model drift, data drift, and cost.

The screenshot below shows one of many self-monitoring dashboards that Dynatrace data scientists use to observe the flawless operation of Davis® AI across all monitoring environments.

Davis anomaly detection self-monitoring dashboard