Try it free

Integrate the Kong AI Gateway into AI Observability data imports to Dynatrace

  • Latest Dynatrace
  • Explanation
  • 2-min read

The Kong AI Gateway is a set of features built on top of Kong Gateway, designed to help developers and organizations adopt AI capabilities quickly and securely. It provides a normalized API layer that allows clients to consume multiple AI services from the same client code base.

Kong-dashboard
Kong-dashboard

Explore the sample dashboard on the Dynatrace Playground.

Enable monitoring

Ensure that the Kong Prometheus plugin is enabled and exposes AI LLM metrics.

Follow the Set up Dynatrace on Kubernetes guide to monitor your cluster.

Afterwards, add the following annotations to your Kong Deployments:

  • metrics.dynatrace.com/scrape: "true"
  • metrics.dynatrace.com/port: "8100"

Follow the OpenTelemetry Collector installation guide to deploy a collector. With the following config, the collector will scrape AI LLM metrics every 10 seconds from the kong-metrics.kong:8100 endpoint.

receivers:
prometheus:
config:
scrape_configs:
- job_name: kong
scrape_interval: 10s
honor_labels: false
static_configs:
- targets:
- kong-metrics.kong:8100
processors:
cumulativetodelta:
max_staleness: 25h
extensions:
health_check:
exporters:
otlp_http:
endpoint: ${env:DT_ENDPOINT}
headers:
Authorization: "Api-Token ${env:DT_API_TOKEN}"
service:
extensions: [health_check]
metrics:
receivers: [prometheus]
processors: [cumulativetodelta]
exporters: [otlp_http]
Cumulativetodelta processor recommendation

It is recommended to set the max_staleness parameter of the cumulativetodelta processor to a value higher than how often the Collector receives metrics (e.g., how often metrics via OTLP are received, or how long the Prometheus scrape interval is). This ensures that no references to abandoned metric streams accumulate in memory over time.

Kong does not provide the kong-metrics service to scrape the metrics out of the box, so you need to create it with the following service definition:

apiVersion: v1
kind: Service
metadata:
name: kong-metrics
namespace: kong
spec:
type: ClusterIP
ports:
- name: metrics
port: 8100
targetPort: 8100
protocol: TCP
selector:
app.kubernetes.io/name: kong
app.kubernetes.io/instance: kong

Spans

The following attributes are available for GenAI Spans.

AttributeTypeDescription
gen_ai.completion.0.contentstringThe full response received from the GenAI model.
gen_ai.completion.0.content_filter_resultsstringThe filter results of the response received from the GenAI model.
gen_ai.completion.0.finish_reasonstringThe reason the GenAI model stopped producing tokens.
gen_ai.completion.0.rolestringThe role used by the GenAI model.
gen_ai.openai.api_basestringGenAI server address.
gen_ai.openai.api_versionstringGenAI API version.
gen_ai.openai.system_fingerprintstringThe fingerprint of the response generated by the GenAI model.
gen_ai.prompt.0.contentstringThe full prompt sent to the GenAI model.
gen_ai.prompt.0.rolestringThe role setting for the GenAI request.
gen_ai.prompt.prompt_filter_resultsstringThe filter results of the prompt sent to the GenAI model.
gen_ai.request.max_tokensintegerThe maximum number of tokens the model generates for a request.
gen_ai.request.modelstringThe name of the GenAI model a request is being made to.
gen_ai.request.temperaturedoubleThe temperature setting for the GenAI request.
gen_ai.request.top_pdoubleThe top_p sampling setting for the GenAI request.
gen_ai.response.modelstringThe name of the model that generated the response.
gen_ai.systemstringThe GenAI product as identified by the client or server instrumentation.
gen_ai.usage.completion_tokensintegerThe number of tokens used in the GenAI response (completion).
gen_ai.usage.prompt_tokensintegerThe number of tokens used in the GenAI input (prompt).
llm.request.typestringThe type of the operation being performed.

Metrics

After following the steps above, the following metrics will be available:

MetricTypeUnitDescription
ai_llm_requests_totalcounterintegerAI requests total per ai_provider in Kong
ai_llm_cost_totalcounterintegerAI requests cost per ai_provider/cache in Kong
ai_llm_provider_latency_ms_buckethistogrammsAI latencies per ai_provider in Kong
ai_llm_tokens_totalcounterintegerAI tokens total per ai_provider/cache in Kong
ai_cache_fetch_latencyhistogrammsAI cache latencies per ai_provider/database in Kong
ai_cache_embeddings_latencyhistogrammsAI cache embedding latencies per ai_provider/database in Kong
ai_llm_provider_latencyhistogrammsAI provider latencies per ai_provider/database in Kong

Additionally, the following metrics are reported.

MetricTypeUnitDescription
gen_ai.client.generation.choicescounternoneThe number of choices returned by chat completions call.
gen_ai.client.operation.durationhistogramsThe GenAI operation duration.
gen_ai.client.token.usagehistogramnoneThe number of input and output tokens used.
llm.openai.embeddings.vector_sizecounternoneThe size of returned vector.
Related tags
AI Observability