Instant Intrusion Response

Latest Dynatrace Early Adopter

Time is crucial when dealing with security incidents. This page shows how you can use Dynatrace to speed up your incident response in two phases:

  • When an attack is detected, Dynatrace can automate the notification and analysis for you, so that the right teams are informed and receive actionable information to start working on the issue.
  • When investigating an issue, each new discovery guides the next steps in the incident response. Through Grail’s schemaless queries, Dynatrace allows you to remain flexible in this time-critical process.

Target audience

This page is intended for Security teams analyzing security incidents, such as the Incident Response team.

Scenario

In the following, we address a scenario in which identifying an attack, researching the scope, determining the responsible entity owners, and remediating the attack takes hours, sometimes even days.

  • Tools like AWS GuardDuty trigger events on suspicious activity independently.
  • Handling the incoming load while increasing security strength is impossible.
  • All correlation and collaboration are manual, resulting in frustration about the silos.
  • Several internal integration tools, hard to manage, have been built; they are helpful, but aren't providing solid and safe automation.
  • A typical suspicious incident takes days to escalate and qualify.

Request

The team wants to quickly

  • Identify when a possible attack is happening.
  • Research the scope and determine what is affected: for example, whether the attack affects only a single isolated, non-production system or threatens a critical part of the environment.
  • Determine the responsible entity owners.
  • Remediate the attack.

Goals

  • Efficiency: The team should be able to respond much faster to attacks.

  • Flexibility: The team should have more flexibility in their response to security incidents.

Result

Combining the Dynatrace automation capabilities with insights into security-related data, our solution helps security teams react and respond faster to attacks. The team automatically scans all ingested logs for patterns that might indicate possible attacks. Because each attack is different, they make use of schemaless queries with instant responses to quickly identify the scope of an attack, thus reducing the required time from days to minutes.

How it works

Context

Logs from your Dynatrace-monitored environment are ingested into Grail via log ingestion. When an attack is detected, a Dynatrace problem is created.

1. Intrusion notification automation

A workflow is automatically triggered by this type of problem. The workflow collects, processes, and enriches the data with context, and converts the resulting information into notifications on your desired channels.

For an example of how you can set up an attack notification automation, see Intrusion notification automation.

2. Instant queries

Based on the information received, you can immediately respond to discoveries and perform further investigations by running a sequence of DQL queries in Notebooks tailored to the attack type.

For details, see Instant queries.

Prerequisites

1. Intrusion notification automation

The following example illustrates how you can implement an attack notification automation using Workflows. You can customize the workflow according to your needs.

Step 1 Set the trigger

The automation needs to be triggered whenever an attack occurs.

In the Select trigger section, select and configure Davis Problem trigger. For details, see Create workflows in Dynatrace Workflows: Trigger.

Set the trigger

Step 2 Determine ownership

Route notifications to the team responsible for the affected entities.

Select the Get owners action to create and configure this task. For details, see Ownership app: get_owners.

Set ownership

Step 3 Set up notification variables

Configure variables such as default values for Slack and Jira fields, that will be used in later steps in the notification process.

Select the Run JavaScript action to create and configure these tasks. For details, see Introduction to workflows: Action.

Define notification channels

Step 4 Collect data and enrich with context

  • Query third-party services. For example:

    • Perform a WHOIS lookup to find details about the attacker's IP address.
    • Verify the IP reputation using a third-party service such as AbuseIPDB.

    Select the HTTP Request action to create and configure these tasks. For details, see Introduction to workflows: Action.

  • Query Dynatrace. For example:

    • Determine whether there were any successful logins from the attacker's IP address.

    • Find out additional traffic information from the attacker's IP address.

    Select the Execute DQL Query action to create and configure these tasks. For details, see Introduction to workflows: Action.

Collect and enrich data

Step 5 Extract successful requests

Extract the successful requests from the total requests collected.

Select the Run JavaScript action to create and configure this task. For details, see Introduction to workflows: Action.

Extract successful requests

Step 6 Replay against target

Replay the successful requests against the target entity to look for indicators of compromise. Custom code steps allow you to automate complex logic that you want to run for each detected attack. Depending on the detected attack and the affected systems, you might want to replay the attacks for more detailed analysis.

Select the Run JavaScript action to create and configure this task. For details, see Introduction to workflows: Action.

Replay against target

Step 7 Send notifications

  • Notify the responsible team on Slack.

    Select the Send message action to create and configure this task. For details, see Use Workflows with Slack.

  • Create a Jira ticket for the entity owner containing the collected information.

    Select the Create issue action to create and configure this task. For details, see Create Jira issues with workflows.

Send notifications

2. Instant queries

After receiving the notification, the security team can immediately respond to discoveries and instantly run additional DQL queries in Notebooks without knowing beforehand where the information they're looking for is. In an emergency situation, this is crucial, as a speedy response can ensure that the attack can be contained.

The following are some examples of how you can query Grail in case of a web attack.

Logs from the attacker IP address

Query example:

fetch logs, scanLimitGBytes:-1
| filter log.source == "/var/log/nginx/access.log"
| filter net.peer.ip == "<<IP ADDRESS>>"
| sort timestamp desc

Successful web requests from the attacker IP address

Query example:

fetch logs, scanLimitGBytes:-1
| filter log.source == "/var/log/nginx/access.log"
| filter net.peer.ip == "<<IP Address>>"
| filter http.status_code == "200"
| sort timestamp desc

Status codes and response codes overview

Query example:

fetch logs, scanLimitGBytes:-1
| filter log.source == "/var/log/nginx/access.log"
| filter net.peer.ip == "<<IP Address>>"
| summarize requests=count(), by:{http.status_code, http.user_agent}
| sort http.status_code

Payloads used by the attacker IP address

Query example:

fetch logs, scanLimitGBytes:-1
| filter log.source == "/var/log/nginx/access.log"
// Successful requests only
| filter http.status_code == "200"
| filter net.peer.ip == "<<IP Address>>"
| summarize requests=count(), by:{http.target}
| sort requests DESC

Successful requests from a given IP address sorted by response size

Query example:

fetch logs, scanLimitGBytes:-1
| filter log.source == "/var/log/nginx/access.log"
| fields timestamp, net.peer.ip, http.target, http.status_code, http.response_content_length, http.user_agent
| filter http.status_code == "200"
// Filter for a specific IP address
| filter net.peer.ip == "<<IP Address>>"
| sort toLong(http.response_content_length) DESC

Payloads in access logs (both successful and not successful)

Query example:

fetch logs, scanLimitGBytes:-1
| filter log.source == "/var/log/nginx/access.log"
| fields payload = "<<PAYLOAD>>", timestamp, net.peer.ip, http.method, http.target, http.status_code, http.request.header.referrer, http.response_content_length, http.user_agent, content
| filter contains(content, payload)

Successful SSO logins from the attacker IP address

Query example:

fetch logs, scanLimitGBytes: -1
// Search for logins
| filter log.source == "/var/log/sso.log"
// Search for successful logins from a given IP address
| filter contains(content, "user login successful") and contains(content, "<<IP address>>")
| sort timestamp desc

Which environments the attacker has logged into

Query example:

fetch logs, scanLimitGBytes: -1
// Search for logins
| filter log.source == "/var/log/sso.log"
// Search for logins from a given account id
| filter contains(content, "<<Account ID>>") and contains(content, "tenant: ")
| sort timestamp desc

Stay ahead of threats

You can use the above instructions as building blocks to automate common steps in your incidents process. These can help you respond faster to security incidents, thus reducing their impact.

Further reading

Log on Grail examples

Davis DQL examples

DQL examples for security-related data