With log processing rules, you can customize incoming log data according to your needs. Read further for example data processing scenarios.
This article is intended for Dynatrace administrators setting up log processing rules.
In this tutorial, you'll utilize log processing rules to:
You can fix unrecognized timestamp and loglevel attributes visible in the log viewer based on the matched log source.
For this example, let's assume that you see a stored event in the log viewer where log.source is set to /var/log/myapp/application.log.#. You notice a couple of things you want to fix:
loglevel detected.So you want to transform your log data to contain the proper values in the timestamp and loglevel fields, and you want to add a new thread.name attribute containing a properly extracted value.
To resolve unrecognized timestamp and log level
Go to Settings Classic (Latest Dynatrace) or Settings (Dynatrace Classic) > Log Monitoring > Processing.
Select Add rule, and provide the following processing rule properties.
Rule name: Fix timestamp and loglevel for MyApp
Matcher: matchesValue(log.source, "/var/log/myapp/application.log.#").
Processor definition: PARSE(content, "TIMESTAMP('MMMMM d, yyyy HH:mm:ss'):timestamp ' [' LD:thread.name '] ' UPPER:loglevel")
This processor definition parses out the timestamp, thread name, and log level.
TIMESTAMP matcher is used to look for the specific date and time format, and the matched value is set as the existing timestamp log attribute.LD (Line Data) matcher is used to match any characters between literals ' [' and '] '.UPPER literal is used to match uppercase letters.You can test your DQL matcher before using it in a log processing rule.
Enter the following log data fragment in Log sample, and select Test the rule.
{"event.type":"LOG","content":"April 24, 2022 09:59:52 [myPool-thread-1] INFO Lorem ipsum dolor sit amet","status":"NONE","timestamp":"1650889391528","log.source":"/var/log/myapp/application.log.#","loglevel":"NONE"}
The processed log data is displayed in Test result. The timestamp and the loglevel fields have proper values. The additional thread.name attribute is also correctly extracted.
{"content":"April 24, 2022 09:59:52 [myPool-thread-1] INFO Lorem ipsum dolor sit amet","timestamp":"1650794392000","event.type":"LOG","status":"NONE","log.source":"/var/log/myapp/application.log.#","loglevel":"INFO","thread.name":"myPool-thread-1"}
Select Save changes to save your log processing rule.
As the new log data is ingested, you'll see processed log data in the log viewer.
In this example, you want to monitor the actual billed duration from your AWS services. You want to use the cloud.provider attribute with the aws value in your log data. In the log viewer, you see a log record containing the following line:
REPORT RequestId: 000d000-0e00-0d0b-a00e-aec0aa0000bc Duration: 5033.50 ms Billed Duration: 5034 ms Memory Size: 1024 MB Max Memory Used: 80 MB Init Duration: 488.08 ms
Additionally, that log record contains the cloud.provider attribute with the aws value.
To create a metric for the AWS service using your log data
Go to Settings Classic (Latest Dynatrace) or Settings (Dynatrace Classic) > Log Monitoring > Processing.
Select Add rule, and provide the following processing rule properties.
matchesPhrase(content, "Billed Duration") and matchesValue(cloud.provider, "aws")PARSE(content, "LD 'Billed Duration:' SPACE? INT:aws.billed.duration"). This definition parses out the billed duration value.You can test your DQL matcher before using it in a log processing rule.
Enter the following log data fragment in Log sample, and select Test the rule.
{"event.type": "LOG","content": "REPORT RequestId: 000d000-0e00-0d0b-a00e-aec0aa0000bc\tDuration: 5033.50 ms\tBilled Duration: 5034 ms\tMemory Size: 1024 MB\tMax Memory Used: 80 MB\t\n","status": "INFO","timestamp": "1651062483672","cloud.provider": "aws","cloud.account.id": "999999999999","cloud.region": "eu-central-1","aws.log_group": "/aws/lambda/aws-dev","aws.log_stream": "2022/04/27/[$LATEST]0d00000daa0c0c0a0a0e0ea0eccc000f","aws.region": "central-1","aws.account.id": "999999999999","aws.service": "lambda","aws.resource.id": "aws-dev","aws.arn": "arn:aws:lambda:central-1:999999999999:function:aws-dev","cloud.log_forwarder": "999999999999:central-1:dynatrace-aws-logs","loglevel": "INFO"}
The processed log data is displayed in Test result. It's enriched with the new aws.billed.duration attribute.
{"event.type": "LOG","content": "REPORT RequestId: 000d000-0e00-0d0b-a00e-aec0aa0000bc\tDuration: 5033.50 ms\tBilled Duration: 5034 ms\tMemory Size: 1024 MB\tMax Memory Used: 80 MB\t\n","status": "INFO","timestamp": "1651062483672","cloud.provider": "aws","cloud.account.id": "999999999999","cloud.region": "eu-central-1","aws.log_group": "/aws/lambda/aws-dev","aws.log_stream": "2022/04/27/[$LATEST]0d00000daa0c0c0a0a0e0ea0eccc000f","aws.region": "central-1","aws.account.id": "999999999999","aws.service": "lambda","aws.resource.id": "aws-dev","aws.arn": "arn:aws:lambda:central-1:999999999999:function:aws-dev","cloud.log_forwarder": "999999999999:central-1:dynatrace-aws-logs","loglevel": "INFO","aws.billed.duration": "5034"}
Assuming that you've observed the above-mentioned log record the log viewer, you can also select Download sample log to automatically populate the Log sample text field with your log data. The fetched log record might look as follows:
REPORT RequestId: 000d000-0e00-0d0b-a00e-aec0aa0000bc Duration: 5033.50 ms Billed Duration: 5034 ms Memory Size: 1024 MB Max Memory Used: 80 MB Init Duration: 488.08 ms
Select Save changes to save your log processing rule.
Go to Settings Classic (Latest Dynatrace) or Settings (Dynatrace Classic) > Log Monitoring > Metric extraction.
Select Add log metric, and provide the following log metric properties to create a log metric based on the parsed-out product identifier (aws.billed.duration).
log.aws.billed.durationmatchesPhrase(content, "Billed Duration") and matchesValue(cloud.provider, "aws")aws.billed.duration
For details see, Log metrics (Logs Classic).Select Save changes to save your log metric.
The log.aws.billed.duration metric is visible in Data Explorer, and you can use it throughout Dynatrace like any other metric. You can add it to your dashboard, include it in analysis, and even use it to create alerts.
A created log metric is available only when new log data is ingested and it matches the log query (Matcher) that you defined when creating the metric. Ensure that new log data has been ingested before using the log metric in other areas of Dynatrace. For details, see Create log metric.
You can test your DQL Matcher before using it in a log processing rule to make sure that the matcher is correct.
To test a DQL matcher
Navigate to the log filter advanced mode.
Logs. Next to the Type to filter text field, select (Actions menu) > Edit DQL query.
Logs & Events Classic and turn on Advanced mode under the query field.Enter the DQL query to find the required log data, and select Run query.
Modify the DQL query until you get the expected result.
Once satisfied with the filtering result, copy the matchesValue function of the DQL query to the clipboard. This is your DQL Matcher that you can use in log processing rules.
You can tailor log processing rules according to your needs.
Below are some examples that might match some of your use cases. Follow all the steps in Fix unrecognized timestamp and log level, but set the log processing rule properties to the values described below. Specifically, replace Processor definition with the provided value. You can also use the provided Log sample to test your log processing rule.
In this example, you see a log line that has the following JSON structure:
{"intField": 13, "stringField": "someValue", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }
Log sample:
{"content": "{"intField": 13, "stringField": "someValue", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }"}
You can use a JSON matcher and configure it to extract desired fields as top-level log attributes. The matcher in flat mode creates attributes automatically and names them exactly the same as the corresponding JSON field names.
You can then use the FIELDS_RENAME command to set the names that fit you.
Processor definition:
PARSE(content, "JSON{STRING:stringField}(flat=true)")| FIELDS_RENAME(better.name: stringField)
Test result:
{"content": "{"intField": 13, "stringField": "someValue", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }","better.name": "someValue"}
You can also parse more fields (including nested ones) using a JSON matcher without flat mode. As a result, you get a VariantObject that you can process further. For example, you can create a top-level attribute from its inner fields.
Processor definition:
PARSE(content, "JSON{STRING:stringField,JSON {STRING:nestedStringField1}:nested}:parsedJson")| FIELDS_ADD(top_level.attribute1: parsedJson["stringField"], top_level.attribute2: parsedJson["nested"]["nestedStringField1"])| FIELDS_REMOVE(parsedJson)
Test result:
{"content": "{"intField": 13, "stringField": "someValue", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }","top_level.attribute1": "someValue","top_level.attribute2": "someNestedValue1"}
Sometimes you're interested in all of the JSON fields. You don't have to list all of the attributes. Instead, a JSON matcher can be used in auto-discovery mode. As a result, you get a VARIANT_OBJECT that you can process further. For example, you can create a top-level attribute from its inner fields.
Processor definition:
PARSE(content,"JSON:parsedJson")| FIELDS_ADD(f1: parsedJson["intField"],f2:parsedJson["stringField"],f3:parsedJson["nested"]["nestedStringField1"],f4:parsedJson["nested"]["nestedStringField2"])| FIELDS_REMOVE(parsedJson)
Test result:
{"content": "{"intField": 13, "stringField": "someValue", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }","f1": "13","f2": "someValue","f3": "someNestedValue1","f4": "someNestedValue2"}
With this approach, you can name the attribute as you like, but the processing rule is more complex.
Processor definition:
PARSE(content, "LD '"stringField"' SPACE? ':' SPACE? DQS:newAttribute ")
Test result:
{"content": "{"intField": 13, "stringField": "someValue", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }","newAttribute": "someValue"}
In this example, we're flattening nested JSON structures with the fieldsFlatten command. Go to DQL structuring commands to learn about the parameters used with fieldsFlatten.
Processor definition:
| parse content, "JSON:parsedContent"| fieldsFlatten parsedContent, prefix:"pref.", depth: 2
Test result:
{"content": "{"intField": 13, "stringField": "someValue", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }","pref.intField": "13","pref.stringField": "someValue","pref.nested.nestedStringField1": "someNestedValue1","pref.nested.nestedStringField2": "someNestedValue2"}
You can parse out attributes from different formats within a single pattern expression.
In this example, one or more applications are logging a user identifier that you want to extract as a standalone log attribute. The log format is not consistent because it includes various schemes to log the user ID: user ID=, userId=, userId: , or user ID =.
03/22 08:52:51 INFO user ID=1234567 Call = 0319 Result = 003/22 08:52:51 INFO UserId = 1234567 Call = 0319 Result = 003/22 08:52:51 INFO user id=1234567 Call = 0319 Result = 003/22 08:52:51 INFO user ID:1234567 Call = 0319 Result = 003/22 08:52:51 INFO User ID: 1234567 Call = 0319 Result = 003/22 08:52:51 INFO userid: 1234567 Call = 0319 Result = 0
With the optional modifier (question ?) and Alternative Groups, you can cover all such cases with a single pattern expression:
Processor definition:
PARSE(content, "LD //matches any text within a single line('user'| 'User') //user or User literalSPACE? //optional space('id'|'Id'|'ID') //matches any of theseSPACE? //optional spacePUNCT? //optional punctuationSPACE? //optional spaceINT:my.user.id")
Using such a log processing rule, you can parse out the user identifier from many different notations.
PARSE commands in one ruleYou can handle various formats or perform additional parsing on already parsed-out attributes with multiple PARSE commands (connected with pipes |) within a single processing rule'.
Processor definition:
PARSE(content, "JSON{STRING:message}(flat=true)") | PARSE(message, "LD 'user ' INT:user.id ': ' LD:error.message")
Here, you parse out the message field, the user ID, and the error message.
Log sample:
{"content": "{"intField": 13, "message": "Error occurred for user 12345: Missing permissions", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }"}
Test result:
{"content": "{"intField": 13, "message": "Error occurred for user 12345: Missing permissions", "nested": {"nestedStringField1": "someNestedValue1", "nestedStringField2": "someNestedValue2"} }","message": "Error occurred for user 12345: Missing permissions","user.id": "12345","error.message": "Missing permissions"}
We provide a comprehensive list of matchers that ease pattern building.
Processor definition:
PARSE(content, "ISO8601:timestamp SPACE UPPER:loglevel SPACE IPADDR:ip SPACE DQS:request SPACE INTEGER:code")
Log sample:
{"content":"2022-05-11T13:23:45Z INFO 192.168.33.1 "GET /api/v2/logs/ingest HTTP/1.0" 200"}
Test result:
{"content": "2022-05-11T13:23:45Z INFO 192.168.33.1 "GET /api/v2/logs/ingest HTTP/1.0" 200","timestamp": "1652275425000","loglevel": "INFO","ip": "192.168.33.1","request": "GET /api/v2/logs/ingest HTTP/1.0","code": "200"}
With log processing rules, you can manipulate any attribute from log, not only content.
Unless specified otherwise, the processing rule works only on the read-only content field. For it to work on different log event attributes, you need to use the USING command.
Processor definition:
USING(INOUT status:STRING, content)| FIELDS_ADD(status:IF_THEN(status == 'WARN' AND content CONTAINS('error'), "ERROR"))
This definition declares two input attributes: writable status and read-only content. Next, it checks whether the status is WARN and the content contains the text error. If both conditions are true, the rule overwrites status with the value ERROR.
Log sample:
{"log.source": "using","timestamp": "1656011002196","status": "WARN","content":"Some error message"}
Test result:
{"log.source": "using","timestamp": "1656011002196","status": "ERROR","content":"Some error message"}
You can add a new attribute to the current log event structure. The FIELDS_ADD command can be used to introduce additional top-level log attributes.
Processor definition:
FIELDS_ADD(content.length: STRLEN(content), content.words: ARRAY_COUNT(SPLIT(content,"' '")))
This definition adds two attributes: the first one stores the length and the second one stores the number of words that are in the content field.
Log sample:
{"log.source": "new_attributes","timestamp": "1656010654603","content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis."}
Test result:
{"content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis.","timestamp": "1656010654603","log.source": "new_attribute","content.length": "62","content.words": "9"}
With all of the available functions and operators, it's easy to perform calculations.
Processor definition:
PARSE(content,"LD 'total: ' INT:total '; failed: ' INT:failed")| FIELDS_ADD(failed.percentage: 100.0 * failed / total + '%')| FIELDS_REMOVE(total, failed)
With this processor definition, we parse the values of total and failed, calculate the percentage that failed, and concatenate the value with the percentage sign. Then we store it in a new attribute called failed.percentage and remove the temporary fields.
Log sample:
{"timestamp": "1656011338522","content":"Lorem ipsum total: 1000; failed: 250"}
Test result:
{"content": "Lorem ipsum total: 1000; failed: 250","timestamp": "1656011338522","failed.percentage": "25.0%"}
To drop an event attribute that is a part of the original record, we first need to declare it as a writable (INOUT option) input field with the USING command and then explicitly remove it with the FIELDS_REMOVE command so that it is not present in the output of the transformation.
Processor definition:
USING(INOUT redundant.attribute:STRING)| FIELDS_REMOVE(redundant.attribute)
With this processor definition, we declare redundant.attribute as an obligatory writeable attribute of STRING type and then we remove it.
We could use the ? character to mark the attribute as optional so that the transformation will still run and also succeed if the attribute is not present in the source event.
In this case, the processor definition would look like this:
USING(INOUT redundant.attribute:STRING?)| FIELDS_REMOVE(redundant.attribute)
Log sample:
{"redundant.attribute": "value","timestamp": "1656011525708","content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla ac neque nisi. Nunc accumsan sollicitudin lacus."}
Test result:
{"content": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla ac neque nisi. Nunc accumsan sollicitudin lacus.","timestamp": "1656011525708"}
The whole log event can be dropped with a FILTER_OUT command. The event is dropped when the condition passed as the command parameter is fulfilled.
In most cases, it is enough to drop every event that has been pre-matched.
For example, if we want to drop all DEBUG and TRACE events, we could set the matcher query to match either of those statuses and then use the FILTER_OUT command to catch everything.
Matcher:
status=="DEBUG" or status=="TRACE"
Processor definition:
FILTER_OUT(true)
Log sample:
{"status": "DEBUG","content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla ac neque nisi. Nunc accumsan sollicitudin lacus."}
This way, all logs with status DEBUG or TRACE are dropped.
It's also possible to have some extra logic and not drop all the events that are pre-matched.
In the following example, we drop incoming events where the execution time is below 100 ms.
Processor definition:
PARSE(content, "LD 'My monitored service call took ' INT:took 'ms'")| FILTER_OUT(took < 100)| FIELDS_REMOVE(took)
Log sample:
{"content":"2022-06-23 06:52:35.280 UTC INFO My monitored service call took 97ms"}
Whenever the content or any other attribute is to be changed, it has to be declared as INOUT (writable) with the USING command. The REPLACE_PATTERN is a very powerful function that can be useful when we want to mask some part of the attribute.
In the following example, we mask the IP address, setting value 0 to the last octet.
Processor definition:
USING(INOUT ip)| FIELDS_ADD(ip: IPADDR(ip) & 0xFFFFFF00l)
Log sample:
{"content":"Lorem ipsum","timestamp": "1656009021053","ip": "192.168.0.12"}
Test result:
{"content": "Lorem ipsum","timestamp": "1656009021053","ip": "192.168.0.0"}
In the following example, we mask the IP address, setting value xxx to the last octet.
Processor definition:
USING(INOUT ip)| FIELDS_ADD(ip: REPLACE_PATTERN(ip, "(INT'.'INT'.'INT'.'):not_masked INT", "${not_masked}xxx"))
Log sample:
{"content":"Lorem ipsum","timestamp": "1656009021053","ip": "192.168.0.12"}
Test result:
{"content": "Lorem ipsum","timestamp": "1656009021053","ip": "192.168.0.xxx"}
In the following example, we mask the entire email address using sha1 (Secured Hash Algorithm).
Processor definition:
USING(INOUT email)| FIELDS_ADD(email: REPLACE_PATTERN(email, "LD:email_to_be_masked", "${email_to_be_masked|sha1}"))
Log sample:
{"content":"Lorem ipsum","timestamp": "1656009924312","email": "john.doe@dynatrace.com"}
Test result:
{"content": "Lorem ipsum","timestamp": "1656009924312","email": "9940e79e41cbf7cc452b137d49fab61e386c602d"}
In the following example, we mask the IP address, email address, and credit card number from the content field.
Processor definition:
USING(INOUT content)| FIELDS_ADD(content: REPLACE_PATTERN(content, "(LD 'ip: '):p1 // Lorem ipsum ip:(INT'.'INT'.'INT'.'):ip_not_masked // 192.168.0.INT // 12' email: ':p2 // email:LD:email_name '@' LD:email_domain // john.doe@dynatrace.com' card number: ': p3 // card number:CREDITCARD:card // 4012888888881881", "${p1}${ip_not_masked}xxx${p2}${email_name|md5}@${email_domain}${p3}${card|sha1}"))
Log sample:
{"timestamp": "1656010291511","content": "Lorem ipsum ip: 192.168.0.12 email: john.doe@dynatrace.com card number: 4012888888881881 dolor sit amet"}
Test result:
{"content": "Lorem ipsum ip: 192.168.0.xxx email: abba0b6ff456806bab66baed93e6d9c4@dynatrace.com card number: 62163a017b168ad4a229c64ae1bed6ffd5e8fb2d dolor sit amet","timestamp": "1656010291511"}
With the FIELDS_RENAME command, we can rename attributes that were a part of an original log event and attributes created within the processor. Whenever we want to change any attribute from the original event, we need to declare it as INOUT (writeable).
Processor definition:
USING(INOUT to_be_renamed, content)| FIELDS_RENAME(better_name: to_be_renamed)| PARSE(content,"JSON{STRING:json_field_to_be_renamed}(flat=true)")| FIELDS_RENAME(another_better_name: json_field_to_be_renamed)
With this processor definition, we rename an existing attribute. Furthermore, we parse out the field from JSON in flat mode and rename the new attribute that has been created automatically with the JSON field name.
Log sample:
{"timestamp": "1656061626073","content":"{"json_field_to_be_renamed": "dolor sit amet", "field2": "consectetur adipiscing elit"}","to_be_renamed": "Lorem ipsum"}
Test result:
{"content": "{"json_field_to_be_renamed": "dolor sit amet", "field2": "consectetur adipiscing elit"}","timestamp": "1656061626073","better_name": "Lorem ipsum","another_better_name": "dolor sit amet"}
The processor definition operates with strongly typed data: the functions and operators accept only declared types of data. The type is assigned to all input fields defined in the USING command as well as to variables created while parsing or using casting functions.
Processor definition:
USING(number:INTEGER, avg:DOUBLE, addr:IPADDR, arr:INTEGER[],bool:BOOLEAN, ts:TIMESTAMP)| FIELDS_ADD(multi:number*10)| FIELDS_ADD(avgPlus1:avg+1)| FIELDS_ADD(isIP: IS_IPV6(addr))| FIELDS_ADD(arrAvg: ARRAY_AVG(arr))| FIELDS_ADD(negation: NOT(bool))| FIELDS_ADD(tsAddYear: TIME_ADD_YEAR(ts,1))
Log sample:
{"content":"Lorem ipsum","number":"5","avg":"123.5","addr":"2a00:1450:4010:c05::69","arr": ["1","2"],"bool":"false","ts":"1984-11-30 22:19:59.789 +0000"}
Test result:
{"content": "Lorem ipsum","number": "5","avg": "123.5","addr": "2a00:1450:4010:c05::69","arr": ["1","2"],"bool": "false","ts": "1984-11-30 22:19:59.789 +0000","tsAddYear": "1985-11-30T22:19:59.789000000 +0000","negation": "true","arrAvg": "1.5","isIP": "true","avgPlus1": "124.5","multi": "50"}
You've completed this tutorial! You've learned how to resolve unrecognized timestamp and loglevel attributes and how to create a metric based on your log data. Moreover, you've understood that you can utilize log processing rules in various situations to achieve the required result.