Mask sensitive data

Telemetry data can often include sensitive data (such as PII), which may need to be redacted due to security and regulatory reasons. While this can be implemented on the application side, it typically is best to handle this centrally using gateways such as the Collector. This enables the single-point management of redaction rules across all your applications and services, without the need to update your code each time a new redaction rule is required.

This page shows sample Collector configurations for the redaction of specific sensitive data (for example, credit card numbers or email addresses) which may appear in telemetry data and which should be masked/redacted before leaving your network.

Prerequisites

Redaction processor versus transform processor

The following examples make use of these two Collector processors:

While the following examples use both processors to mask data, each processor has its own distinct purpose. The redaction processor is straightforward and takes a list of values, based on which matching data will be completely redacted. On the other hand, the purpose of the transform processor is more versatile and goes beyond mere data redaction.

For data redaction, typically either processor can be used and you may want to choose the one best for your use case. For example, for full data redaction, the redaction processor may be easier to use. On the other hand, partial data redaction can only be achieved with the transform processor. In addition, the transform processor can also filter by data in the body of logs, whereas the redaction processor only has access to attributes.

Demo configuration

This YAML document is a basic Collector configuration skeleton, containing basic, general components (that is, receivers, exporters, and the pipeline definition).

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
PLACEHOLDER-FOR-PROCESSOR-CONFIGURATIONS
exporters:
otlphttp:
endpoint: ${env:DT_ENDPOINT}
headers:
"Authorization": "Api-Token ${env:DT_API_TOKEN}"
service:
pipelines:
traces:
receivers: [otlp]
processors: [PLACEHOLDER-FOR-PROCESSOR-REFERENCES]
exporters: [otlphttp]
metrics:
receivers: [otlp]
processors: [PLACEHOLDER-FOR-PROCESSOR-REFERENCES]
exporters: [otlphttp]
logs:
receivers: [otlp]
processors: [PLACEHOLDER-FOR-PROCESSOR-REFERENCES]
exporters: [otlphttp]

Make sure to replace the placeholder values in the document with the respective configurations:

  • PLACEHOLDER-FOR-PROCESSOR-CONFIGURATIONS: the relevant processor configuration
  • PLACEHOLDER-FOR-PROCESSOR-REFERENCES: referencing the applicable processor objects for the individual signal types

Using the transform processor, we mask the attribute client.address with the set statement.

transform:
error_mode: ignore
trace_statements:
- context: span
statements: &filter-statements
# this will not only mask end user client IP addresses,
# but also the address of a server acting as a client when establishing a connection to another server
- set(attributes["client.address"], "<masked-ac-ot-clientip>")
metric_statements:
- context: datapoint
statements: *filter-statements
log_statements:
- context: log
statements: *filter-statements

Using the transform processor, we mask the attribute user.email with the set statement.

transform:
error_mode: ignore
trace_statements:
- context: span
statements: &filter-statements
- set(attributes["user.email"], "<masked-ac-ot-email>")
metric_statements:
- context: datapoint
statements: *filter-statements
log_statements:
- context: log
statements: *filter-statements

Using the redaction processor, we use the regular expression dt0[a-z]0[1-9]\.[A-Za-z0-9]{24}\.([A-Za-z0-9]{64}) to mask all occurrences of Dynatrace API tokens in our telemetry data.

redaction:
allow_all_keys: true
blocked_values:
- dt0[a-z]0[1-9]\.[A-Za-z0-9]{24}\.([A-Za-z0-9]{64})
summary: info

Using the transform processor, we mask the attributes user.id, user.name, and user.full_name with the set statement.

transform:
error_mode: ignore
trace_statements:
- context: span
statements: &filter-statements
- set(attributes["user.id"], "<masked-ac-ot-userid")
- set(attributes["user.name"], "<masked-ac-ot-username")
- set(attributes["user.full_name"], "<masked-ac-ot-userfullname")
metric_statements:
- context: datapoint
statements: *filter-statements
log_statements:
- context: log
statements: *filter-statements

Using the transform processor, we configure three replace_all_patterns statements to mask any occurrences of credit card numbers and mask everything but the last four digits.

transform:
error_mode: ignore
trace_statements:
- context: span
statements: &filter-statements
- replace_all_patterns(attributes, "value", "^3\\s*[47](\\s*[0-9]){9}((\\s*[0-9]){4})$", "<masked-ac-ot-pcard$$2>")
- replace_all_patterns(attributes, "value", "^(5[1-5]([0-9]){2}|222[1-9]|22[3-9]\\d|2[3-6]\\d{2}|27[0-1]\\d|2720)(\\s*[0-9]){8}\\s*([0-9]{4})$", "<masked-ac-ot-pcard$$4>")
- replace_all_patterns(attributes, "value", "^4(\\s*[0-9]){8,14}\\s*(([0-9]\\s*){4})$", "<masked-ac-ot-pcard$$2>")
metric_statements:
- context: datapoint
statements: *filter-statements
log_statements:
- context: log
statements: *filter-statements

Using the redaction processor, we use the regular expression ^[A-Z]{2}[0-9]{2}(\\s*[A-Z0-9]){8,30}$ to mask all IBAN occurrences in our telemetry data.

redaction:
allow_all_keys: true
blocked_values:
- "^[A-Z]{2}[0-9]{2}(\\s*[A-Z0-9]){8,30}$"
summary: info
Configuration validation

Validate your settings to avoid any configuration issues.

Components

For our configuration, we use the following components.

Receivers

Under receivers, we specify the standard otlp receiver as active receiver component for our Collector instance.

Processors

Under processors, we place the configuration for the relevant processor instances.

Exporters

Under exporters, we specify the default otlphttp exporter and configure it with our Dynatrace API URL and the required authentication token.

For this purpose, we set the following two environment variables and reference them in the configuration values for endpoint and Authorization.

Service pipelines

Under service, we eventually assemble all the configured objects into pipelines for the individual telemetry signals (traces, etc.) and have the Collector instance run the configured tasks.