Automated bug triaging and ticketing

Latest Dynatrace

In today's continuous delivery approach, even with extensive QA processes, some errors are only discovered through real-life infrastructure usage as close to production as possible. Dynatrace allows R&D teams to easily react to bugs or problems by integrating a Jira ticket creation workflow to resolve team dependencies.

Introduction

This guide presents a straightforward method for utilizing Dynatrace to clarify ambiguous responsibilities and interdependencies. In this use case, you create an internal tool that takes care of identifying bugs from logs, as well as grouping, triaging, and distributing them to a bug tracker that clarifies ambiguous responsibilities and interdependencies. Such an internal tool will allow you to:

  • Detect bugs as early as possible before becoming visible to customers.
  • Review and prioritize newly discovered problems and ensure ownership of each item is clearly defined.
  • Increase the quality of released product.

Target audience

This article is intended for development team leaders, development teams and anyone who is responsible to triaging and assigning bugs to development teams.

Scenario

Your company operates considerable infrastructure. You use Dynatrace to monitor both the production and pre-production segments of this infrastructure. Dynatrace monitors thousands of hosts and ingests multiple TB of logs per day.

These logs contain valuable information about errors that occurred on the systems that might need to be addressed by the various development teams.

With many teams, distributed systems, interconnected components, and large volumes of log data, it is challenging to clarify responsibilities and interdependencies.

Teams meet for a daily bug meeting and review newly discovered problems to ensure that the ownership of each item is clearly defined.

  • Teams want to quickly discover and assess new errors.
  • Each development team wants to know how their code behaved in the production and pre-production stages and prioritize, investigate and fix any issues.

The volume makes it impossible to analyze manually, and the large number of teams adds the complexity of bringing the right log lines to the right teams.

Prerequisites

Make sure all of these are true before you start:

Access and permissions

  • You should have the necessary permissions to configure and access monitoring tools within Dynatrace.
  • You should have permissions to ingest log data into Dynatrace.
  • Make sure the following permissions are enabled.
    • Grail: storage:logs:read. For instructions, see Assign permissions in Grail.
    • Workflows: Permissions to access, view, write, and execute workflows. For details, see Authorization.

      To access permissions, go to the Settings menu in the upper-right corner of the Workflows app and select Authorization settings.

      You should set permissions required by Workflows to run actions on your behalf. Grant permissions to Workflows (allows the workflow to convert resulting findings into Jira tickets).
  • You can integrate Dynatrace with various bug tracking tools and frameworks used in your pipeline.

Knowledge

Backend service

  • You have a backend service that:
    • can group log data
    • can process log data based on their message content
    • can store grouped log data
    • can serve grouped log data through API
    • is located in your infrastructure and is reachable through EdgeConnect by any Dynatrace application in your tenants

How it works

Dynatrace Automated bug triaging and ticketing consists of three stages:

  1. Ingesting log data
  2. Grouping and processing log data
  3. Using workflows for automated Jira ticketing

Data flow of the bug meeting distribution application

Steps

Complete the following steps to create your internal tool for bug triaging and distribution.

Step 1 Ingest log data

Ingest log and event data finto Dynatrace using OneAgent or Dynatrace API:

  • Log ingestion via OneAgent
    Automatically discover and ingest log and event data from a vast array of technologies. It supports multiple configurations of timestamps, masking sensitive data, detecting log rotation patterns, and much more.
  • Generic log ingestion
    With generic log ingestion, you can stream log records to a system and have Dynatrace transform the stream into meaningful log messages.

Step 2 Group log data

Using your backend service, aggregate and group log data and combine that into business events.

  • Filter log events by severity (critical and high) and timestamp (newest events).
    Create specific DQL queries to fetch the logs and have a customised overview. The following query examples allow you to filter log data with specific loglevel value in OneAgent logs (based on specific log.source) and log data with specific loglevel value for specific process groups (based on dt.entity.process_group)

    fetch logs, timeframe:"2023-10-04T00:00:00Z/2023-10-04T04:00:00Z"
    | filter startsWith(log.source, "/var/log/dynatrace/oneagent/java/")
    | filter loglevel == "WARN" OR loglevel == "SEVERE"
    fetch logs, timeframe:"2023-10-01T09:03:18Z/2023-10-02T09:03:18Z", scanLimitGBytes: -1
    | filter dt.entity.process_group == "PROCESS_GROUP-9C77F43CE98EAA17" OR
    dt.entity.process_group == "PROCESS_GROUP-a3444fa7ad1c3d41" OR
    dt.entity.process_group == "PROCESS_GROUP-e50850ceefbd9c7f" OR
    dt.entity.process_group == "PROCESS_GROUP-689574f4e137425c" OR
    dt.entity.process_group == "PROCESS_GROUP-37925f9c37724091"
    | filter ((loglevel == "WARN" OR
    loglevel == "WARNING" OR
    loglevel == "SEVERE" OR
    loglevel == "ERROR" OR
    loglevel == "CRITICAL" OR
    loglevel == "ALERT"))
  • Group log messages.
    Group similar log messages together to avoid duplicates and, based on tags or group metadata, identify teams that are likely the correct owners of the bug. Save the metadata, such as timestamp, thread name, class name, and log level.

  • Process the log message.
    For the log message processing, split the message into tokens and categorize them as mandatory word (conventionally prefixed with ##), word, separator, and non-relevant word (dynamic log parts as IDs or timestamps). The grouping message of a log is then built as classname (if present), plus the first four more relevant words, plus all remaining mandatory words.

    To summarize, the log records are grouped by:

    • Tags
    • Group metadata
    • Related process group events

Step 3 Create Dynatrace Workflow WIP

Create workflow based on business event to automatically create and assign Jira tickets.

Workflow where event triggers a Jira ticket creation.

Step 4 Create Dynatrace Application

Create Dynatrace Application to consume and present grouped log data.

Using Dynatrace AppEngine create a dashboard application that consumes the grouped log data and based on this preliminary triaging and vetting, creates Jira issues in the correct projects assigned to the right teams.

Connect the dashboard application to the backend (via the EdgeConnect) and execute API calls. Dynatrace applications can also interact with your on-premises systems which you can run in your corporate network.

Application for distributing log records based on severity.

The dashboard application will analyze error or warning logs that indicate bugs in the code or system configuration that needs to be enhanced (for example, low CPU or low disk space). It can show in the overview the grouped logs for which tickets and comments can be automatically or manually created.

Application for distributing log records based on severity.

Conclusion

By implementing dashboard application with Dynatrace and including it in a daily meeting, teams can effectively detect, triage, and prioritize bugs, and automatically assign tickets to the right teams eliminating dependencies between teams. Teams have an overview for the component and services they are interested in and this allows them to focus and to respond quicker to problems before they reach the production and customers.

All teams have the same view of newly discovered bugs and can claim ownership in a self-service fashion. For all bugs, responsibilities and ownership can be clarified in a daily sync of less than half an hour. Without this common view, discussions would be dispersed and could easily take hours.

This results in higher quality of the product, elimination of manual correlation and ultimately also impacts the required effort for customer support.