Service-level objective examples

Service-Level Objectives SLOs offers a set of service-level objective (SLO) examples that you can use to create your service-level objectives using DQL.

We also offer a set of pre-configured SLO templates. For more information on the SLO templates, see Service-level objective templates.

See the SLO configuration examples to understand some of the possibilities for service-level indicators (SLIs).

Log-pattern based SLO

This SLI measures the proportion of the log lines with loglevels INFO and WARNING against all log lines.

Details of the example

  • Data source: logs

  • Entity scope: app-id

  • SLI DQL query:

    fetch logs, scanLimitGBytes: -1
    | fieldsAdd failed = coalesce(if(loglevel == "INFO" OR loglevel =="WARNING", 1), 0)
    | makeTimeseries {failed = avg(failed), total = count()}, by: {dt.app.id}
    | fieldsAdd sli = 100 - ((toDouble(failed[]) / toDouble(total[])) * 100)

Performance by service

This SLI measures the duration of service requests based on spans.

Details of the example

  • Data source: spans/traces, responsetimes/duration

  • Entity scope: services

  • SLI DQL query:

    fetch spans
    | filter dt.entity.service == "SERVICE-53B3E0D705DB0194"
    | makeTimeseries{total = count(),good = countIf(duration <= 150ms)}, by:{name = entityName(dt.entity.service)}
    | fieldsAdd sli = 100 * (good[]/total[])
    | fieldsRemove total, good

Performance by service endpoint

This SLI measures a selected endpoint's latency (performance) as the proportion of service requests that are served faster than a defined response time in milliseconds, based on spans.

Details of the example

  • Data source: spans/traces, responsetimes/duration

  • Entity scope: services, endpoint

  • SLI DQL query:

    fetch spans
    | filter endpoint.name == "/Booking"
    | makeTimeseries {total = count(), good = countIf(duration < 150ms)}, by:{endpoint.name}
    | fieldsAdd sli = 100 * (good[]/total[])
    | fieldsRemove total, good

SLO for release validations: checkoutservice

This SLI measures the proportion of successful guardian release validations.

SLO for release validations: checkoutservice screen

Details of the template

  • Data source: bizevents (guardian validations)

  • Entity scope: guardians

  • SLI DQL query:

    fetch bizevents
    | filter event.type == "guardian.validation.finished"
    | parse `validation.summary`, """JSON{ INT: "pass",INT: "warning", INT: "fail", INT: "error", INT: "info" }:result"""
    | fieldsAdd all = result[pass] +result[warning]+result[fail] + result[error] + result[info]
    | fieldsAdd nok = (result[fail] + result[error] + result[info])
    | makeTimeseries {all = sum(all), nok = sum(nok)}, by: {guardian.name}, interval: 10min
    | filter in(guardian.name,"Three golden signals (checkoutservice)")
    | fieldsAdd sli = 100 * ((all[]-nok[])/all[])
    | fieldsRemove all , nok

SLO for synthetic browser availability considering business hours

This SLI measures the proportion of successful browser monitor tests over time, only considering business hours (Monday–Friday, 9 AM–5 PM UTC+2).

Details of the example

  • Data source: metrics (timeseries); user input: timezone, business hours, work days

  • Entity scope: Synthetic browser test, Synthetic location

  • SLI DQL query:

    timeseries {sli = avg(dt.synthetic.browser.availability), timestamp=start()}, by:{dt.entity.synthetic_test,dt.entity.synthetic_location}, interval:1min
    | fieldsAdd entityName = entityName(dt.entity.synthetic_test)
    | fieldsAdd locationName = entityName(dt.entity.synthetic_location)
    | filter in(entityName, "Dynatrace website")
    | fieldsAdd sli=if(getDayOfWeek(timestamp[])<6, sli[])
    | fieldsAdd sli=if(getHour(timestamp[],timezone:"Europe/Bucharest")>=9, sli[])
    | fieldsAdd sli=if(getHour(timestamp[],timezone:"Europe/Bucharest")<=17, sli[])

Service performance for services with a certain tag

This SLI measures the proportion of successful service requests, filtered for services with a particular tag, over time.

Details of the example

  • Data source: metrics (timeseries); tags

  • Entity scope: services

  • SLI DQL query:

    timeseries total=avg(dt.service.request.response_time), default:0, by: { dt.entity.service }
    | fieldsAdd tags=entityAttr(dt.entity.service, "tags")
    | filter in(tags, "[Environment]DT_RELEASE_PRODUCT:easytravel")
    | fieldsAdd high=iCollectArray(if(total[]> (1000 * 500), total[]))
    | fieldsAdd low=iCollectArray(if(total[]<= (1000 * 500), total[]))
    | fieldsAdd highRespTimes=iCollectArray(if(isNull(high[]),0,else:1))
    | fieldsAdd lowRespTimes=iCollectArray(if(isNull(low[]),0,else:1))
    | fieldsAdd entityName = entityName(dt.entity.service)
    | fieldsAdd sli=100*(lowRespTimes[]/(lowRespTimes[]+highRespTimes[]))
    | fieldsRemove total, high, low, highRespTimes, lowRespTimes, tags

Service availability for critical services (tagged Gold)

This SLI measures the proportion of successful service requests over time, considering only gold-tier tagged services.

Details of the example

  • Data source: metrics (timeseries); tag

  • Entity scope: services

  • SLI DQL query:

    timeseries { total=sum(dt.service.request.count) ,failures=sum(dt.service.request.failure_count) }, by: { dt.entity.service }
    | fieldsAdd tags=entityAttr(dt.entity.service, "tags")
    | filter in(tags, "criticality:Gold")
    | fieldsAdd entityName = entityName(dt.entity.service)
    | fieldsAdd sli=(((total[]-failures[])/total[])*(100))
    | fieldsRemove total, failures, tags