AI Observability refers to the ability to gain insights and understand the behavior, performance, and cost of artificial intelligence (AI) models and services during their operation. This involves monitoring, analyzing, and visualizing the important internal states, inputs, and outputs of AI models to ensure their correctness, reliability, and effectiveness.
According to various national AI regulation drafts, such as the European Union Artificial Intelligence Act, observability and transparency in AI is crucial because AI systems are often complex and operate in dynamic and unpredictable environments or are of dynamic and unpredictable and probabilistic nature themselves.
It is important to have visibility into how AI models make decisions, detect biases, understand their limitations, and identify potential issues or anomalies. By observing the AI system's behavior, data scientists, engineers, and operators can gain valuable insights and make informed decisions to improve and optimize the system's performance.
Essential metric categories for AI observability
Best practice for AI observability is to collect the following metric categories for all running AI models:
- Stability: Measures the stability of a given model. How often a model was able to successfully return a result in relation to those cases where the model failed to deliver.
- Latency: Measure the latency of a model when fulfilling a request. How long does it take a generative AI service to return a result, or how long does it take a route optimizing model to return the requested route.
- Load: Measure how much load the model handles and check for abnormal load spikes or drops to detect outages or abnormal usage patterns.
- Model drift: Refers to the degradation of model performance due to changes in input data and its relationships to the output. It can be measured in different ways. For example, a classical measure for a prediction model is the delta of predicted value to the true expected value.
- Data drift: For models where the true output value becomes available only with a large latency, it may be required to measure the stationarity of the input data as a surrogate measure for potential model drift.
- Cost: Measure the cost of the model. Examples here are to measure the number of tokens in the prompt and return of a GPT large language model and to multiply it by the token costs of the used external SaaS service.
Observing AI models and services is inherently domain-driven, as only the creator of a model can expose critical states, logs, and measurements for effective monitoring.
There are several key components of AI Observability:
Monitoring: Continuous monitoring of AI models and services is essential to collect and analyze relevant data during their operation. This includes monitoring the input data, internal states, and output predictions or decisions made by the model. By tracking these aspects, any issues, errors, or unexpected behaviors can be identified in real-time.
While the general performance and stability of the service is covered automatically by Dynatrace, the owner of the AI model is responsible to identify key operational indicators and to expose those to the monitoring platform in terms of custom metrics, events, and logs.
Logging: Logging involves capturing and recording relevant events, errors, and activities of the AI system. It helps in understanding the sequence of actions and provides a detailed record of what occurred during the system's operation. This information can be useful for debugging, performance analysis, and post-mortem analysis.
Metrics and Performance Analysis: Defining and tracking metrics related to the AI system's performance is crucial for observability. These can include accuracy, precision, recall, latency, throughput, or any other essential metric type. Analyzing these metrics with Dynatrace dashboards and notebooks over time can help identify patterns, trends, and performance degradation.
Visualization: Build domain-specific dashboards to visualize the behavior and performance of AI systems to better understand their operation. Visualizations can include charts, graphs, dashboards, or other visual representations of the system's inputs, outputs, and internal states. These visualizations enable stakeholders to quickly identify patterns, anomalies, or issues.
Anomaly Detection: Setting up alerting mechanisms and anomaly detection systems is important to proactively identify and respond to potential issues in AI systems. Alerts can be triggered based on predefined thresholds, unexpected behaviors, or deviations from expected patterns. This enables timely intervention and troubleshooting.
Explainability and Interpretability: AI models often operate as black boxes, making it difficult to understand their decision-making process. Observability aims to enhance explainability and interpretability by providing insights into the factors influencing the model's outputs. Techniques like model interpretability, feature importance analysis, or visualization of intermediate representations can help in understanding the reasoning behind AI model decisions.
By implementing AI observability practices, organizations can improve the reliability, trustworthiness, and performance of their AI systems. It enables proactive monitoring, debugging, and optimization, leading to more robust and responsible AI deployments.
Monitor in-house or external models?
Dynatrace distinguishes between two main categories of AI observability, depending on the deployment and ownership of a given AI model service:
Observability of outsourced AI services running in SaaS
This category refers to AI services that are provided by external vendors or SaaS providers, such as OpenAI. These services are developed and maintained by third-party companies and are utilized by integrating their APIs or accessing their platforms.
Examples of outsourced AI services could include pre-trained models for natural language processing, generative services, computer vision, recommendation systems, or speech recognition.
See the below examples of how Dynatrace supports observability for outsources AI SaaS services:
Observability of in-house built AI services
This category encompasses AI services that are developed, deployed, and maintained by your own organization. These services are built from scratch or using frameworks and libraries, and you have full control over their implementation, deployment, and observability.
In-house AI services could include custom machine learning models, proprietary algorithms, or specialized AI solutions developed specifically for your organization's needs.
See the below examples of how Dynatrace supports observability for AI/ML models:
Dynatrace leverages multiple platform capabilities to monitor Davis AI
Dynatrace, as an AI native enterprise, we walk the talk and observe our own Davis AI using Dynatrace.
We identify and observe our own critical signals for measuring the training latency and evaluation time for all Davis anomaly detection models, to react fast in case of increased latency or reduced stability.
The below screenshot shows one out of many self-monitoring dashboards that the Dynatrace data scientists use to observe the flawless operation of Davis AI across all monitoring environments: