Aggregation commands
fieldsSummary
The fieldsSummary
command calculates the cardinality of field values that the specified fields have.
Syntax
fieldsSummary field, … [, topValues] [, extrapolateSamples]
Parameters
Parameter | Type | Description | Required |
---|---|---|---|
field | field identifier | A field identifier. | required |
topValues | positive long | The number of top N values to be returned. | optional |
extrapolateSamples | boolean | Flag indicating if the cardinality shall be multiplied with a possible sampling rate. | optional |
Basic example
Example: Simple fields summary
The following example shows the cardinality of values for the field host
.
data record(host = "host-a"),record(host = "host-a"),record(host = "host-b")| fieldsSummary host
Query result:
field | rawCount | count | values |
---|---|---|---|
host | 3 | 3 | [value: host-a count: 2 , value: host-b count: 1 ] |
Practical example
Example: Log count by host
In this example, the query fetches logs with a sampling ratio of 10,000 and calculates the cardinality of values for the dt.entity.host
field, providing the 10 top values encountered when extrapolating the value count by the sampling ratio.
fetch logs, samplingRatio: 10000| fieldsSummary dt.entity.host, topValues: 10, extrapolateSamples: true
makeTimeseries
Creates timeseries from the data in the stream. The makeTimeseries
command provides a convenient way to chart raw non-metric data (such as events or logs) over time.
Syntax
makeTimeseries [by: { [expression, …] }] [, interval] [, bins] [, from] [, to] [, timeframe] [, time ,] [, spread ,] aggregation, …
Parameters
Parameter | Type | Description | Required |
---|---|---|---|
aggregation | aggregation function | The aggregation function that should be used to create the series. | required |
time | expression | The expression that provides the timestamp for a record, which will be assigned to a series bucket. If the current data does not contain a field | optional |
from | timestamp, duration | The start timestamp of the series. The actual series timeframe will be adjusted by the series alignment. | optional |
to | timestamp, duration | The end timestamp of the series. The actual series timeframe will be adjusted by the series alignment. | optional |
timeframe | timeframe | The timeframe of the series. | optional |
bins | positive integer | The number of buckets that should be created for the series. | optional |
interval | positive duration | The length of a bucket in the series. Must not be specified if bins is specified. Automatically calculated based on the number of bins. | optional |
by | list of expressions | The expressions the series should be split by. | optional |
spread | timeframe | A timeframe expression that provides the timeframe for the bucket calculation of the values in the series. | optional |
Basic example
Example: Count records
The following example counts the number of records per time interval.
data record(timestamp = toTimestamp("2019-08-01T09:30:00.000-0400")),record(timestamp = toTimestamp("2019-08-01T09:31:00.000-0400")),record(timestamp = toTimestamp("2019-08-01T09:31:30.000-0400")),record(timestamp = toTimestamp("2019-08-01T09:32:00.000-0400"))| makeTimeseries count(),from: toTimestamp("2019-08-01T09:30:00.000-0400"),to: toTimestamp("2019-08-01T09:33:00.000-0400")
Query result:
timeframe | interval | count() |
---|---|---|
start: 2019-08-01T13:30:00.000Z end: 2019-08-01T13:33:00.000Z | 1 min | [1, 2, 1] |
Practical examples
Example 1: Error logs for host
This example shows error logs for a host over time.
fetch logs| filter dt.entity.host == "HOST-02A6022668FCA28E" and loglevel == "ERROR"| makeTimeseries count()
Example 2: Buy activity for account
The following example shows the buy activity for a specific account for the last 24 hours.
The interval
parameter sets the time interval to 30 minutes, resulting in 48 bins in the query outcome.
A bin with no buy activity has the value 0
because the default
parameter is 0
. Not setting this parameter results in NULL
values for the bins.
fetch bizevents, from: now() - 24h| filter accountId == 7| filter in(event.category, array("/broker-service/v1/trade/long/buy", "/v1/trade/buy"))| makeTimeseries count(default: 0), interval: 30m
Example 3: Transaction statistics
The following advanced example takes sell transactions of the last seven days recorded as business events and charts the total number of transactions and high-volume transactions based on the conditional countIf()
function and the maximum price by accountId
. The interval
parameter defines a fixed chart resolution of 1 day.
fetch bizevents, from: now() - 7d| filter in(event.type, array("com.easytrade.long-sell", "com.easytrade.quick-sell"))| makeTimeseries {count(),high_volume = countIf(amount >= 100 and amount <= 10000),max(price)},by: { accountId },interval: 1d
Bins parameter
The interval
and bins
parameters are exclusive. Both parameters are optional.
If specified, the bins
parameter (range: 12–1,500) is used to calculate an equivalent nominal time interval.
The nominal time interval, whether coming from the bins
or interval
parameter, is adjusted so that:
The time interval matches a well-known interval (1 minute, 5 minutes, 10 minutes, 15 minutes, 30 minutes, or 1 hour)
The time interval does not exceed the maximum number of elements per series (1,500).
Aggregation functions
Seven aggregation functions are available to use with the makeTimeseries
command. These functions are: min
, max
, sum
, avg
, count
, countIf
and percentile
.
Syntax
These functions have the following syntax:
sum(expression [, default] [, rate])
avg(expression [, default] [, rate])
min(expression [, default] [, rate])
max(expression [, default] [, rate])
count([default] [, rate])
countIf(expression [, default] [, rate])
Parameters
Parameter | Type | Description | Required |
---|---|---|---|
expression | expression | The expression the series should be created for. | required |
default | number | The default value that should be used to fill gaps/empty bins. The default value is null. | optional |
rate | duration | The duration that shall be used to adjust the bin values using the following formula: (binValue / interval) * rate. | optional |
percentile | double, long | The nth-percentile, such that approximately n percent of observed measurements fall below this value. Must be between 0 and 100. | required |
summarize
Groups together records that have the same values for a given field and aggregates them.
Syntax
summarize [field =] aggregation, ... [, by: {[field =] expression, ...}]
Parameters
Parameter | Type | Description | Required |
---|---|---|---|
expression | array, boolean, counter, double, duration, ip, long, record, string, timeframe, timestamp | An expression to group by. | required |
aggregation | array, boolean, counter, double, duration, ip, long, record, string, timeframe, timestamp | An aggregation function | required |
Basic examples
Example 1: Sum field
The below example uses the summarize
command and the sum
aggregation function to sum the field value
.
data record(value = 2),record(value = 3),record(value = 7),record(value = 7),record(value = 1)| summarize sum(value)
Query result:
sum(value) |
---|
20 |
Example 2: Summarize by category
The following example calculates the sum of the field value
and splits the result by the field cat
. Furthermore, the resulting aggregation field gets the alias name sum
and the grouping field the alias category
.
data record(value = 2, cat = "a"),record(value = 3, cat = "b"),record(value = 7, cat = "a"),record(value = 7, cat = "b"),record(value = 1, cat = "b")| summarize sum = sum(value),by: { category = cat }
Query result:
category | sum |
---|---|
a | 9 |
b | 11 |
Example 3: Empty aggregation result
If the summarize
command has no input records and you don't use the by
clause, the summarize
command will still return a single record a as result.
data record(value = 2),record(value = 3),record(value = 7),record(value = 7),record(value = 1)| filter value > 7| summarize count(), sum(value), collectArray(value), takeAny(value)
Query result:
count() | sum(value) | collectArray(value) | takeAny(value) |
---|---|---|---|
0 | null | null | null |
Example 4: Group by field doesn't exist for every record
If the grouped field does not exist for every record, the summarize command adds a null
group to the result.
data record(value = 2),record(value = 3, category = "b"),record(value = 7, category = "a"),record(value = 7, category = "b"),record(value = 1)| summarize sum(value),by: { category }
Query result:
category | sum(value) |
---|---|
a | 7 |
b | 10 |
null | 3 |
Example 5: Using summarize instead of join
You can use the summarize
command instead of the join
command when joining on the same table.
As input for the summarize
command, there are two types of records.
In this example, there are records with fields key
and value
and records with fields key
and amount
. The key
field is set as the by
parameter.
The other fields are added to the aggregation list.
If a record type has duplicate key values, use the aggregation function collectArray
for its fields.
Otherwise, use the takeAny
aggregation function for its record type fields. As the last step, you need to expand the fields that use the collectArray
function.
Use the summarize
command instead of the join
to avoid fetching the same table twice.
data record(key = "a", value = 1),record(key = "b", value = 2),record(key = "c", value = 4),record(key = "b", amount = 10),record(key = "c", amount = 20),record(key = "c", amount = 40),record(key = "d", amount = 50)| summarize {value = takeAny(value),amount = arrayRemoveNulls(collectArray(amount))},by: { key }| expand amount
Query result:
key | value | amount |
---|---|---|
b | 2 | 10 |
c | 4 | 20 |
c | 4 | 40 |
d | null | 50 |
Example 6: Element-wise aggregation
The following example uses 'summarize' with an iterative expression in the 'sum' aggregation function to calculate the element-wise sum of arrays in the input records.
data record(a = array(2, 2)),record(a = array(7, 1))| summarize sum(a[])
Query result:
sum(a[]) |
---|
[9, 3] |
Practical example
Example: Count selected log levels
The following example counts selected log levels and groups them by dt.entity.host
and dy.entity.process_group
.
fetch logs| summarize {errors = countIf(loglevel == "ERROR"),warnings = countIf(loglevel == "WARN"),severe = countIf(loglevel == "DEBUG")},by: {dt.entity.host,dt.entity.process_group}