The fieldsSummary
command calculates the cardinality of field values that the specified fields have.
fieldsSummary field, … [, topValues] [, extrapolateSamples]
Parameter | Type | Description | Required |
---|---|---|---|
field | field identifier | A field identifier. | required |
topValues | positive long | The number of top N values to be returned. | optional |
extrapolateSamples | boolean | Flag indicating if the cardinality shall be multiplied with a possible sampling rate. | optional |
The following example shows the cardinality of values for the field host
.
data record(host = "host-a"),record(host = "host-a"),record(host = "host-b")| fieldsSummary host
Query result:
host
3
3
host-a
count: 2
, value: host-b
count: 1
]In this example, the query fetches logs with a sampling ratio of 10,000 and calculates the cardinality of values for the dt.entity.host
field, providing the 10 top values encountered when extrapolating the value count by the sampling ratio.
fetch logs, samplingRatio: 10000| fieldsSummary dt.entity.host, topValues: 10, extrapolateSamples: true
Creates timeseries from the data in the stream. The makeTimeseries
command provides a convenient way to chart raw non-metric data (such as events or logs) over time.
makeTimeseries [by: { [expression, …] }] [, interval] [, bins] [, from] [, to] [, timeframe] [, time ,] [, spread ,] [, nonempty ,] aggregation, …
Parameter | Type | Description | Required |
---|---|---|---|
aggregation | aggregation function | The aggregation function that should be used to create the series. | required |
time | expression | The expression that provides the timestamp for a record, which will be assigned to a series bucket. If the current data does not contain a field | optional |
from | timestamp, duration | The start timestamp of the series. The actual series timeframe will be adjusted by the series alignment. | optional |
to | timestamp, duration | The end timestamp of the series. The actual series timeframe will be adjusted by the series alignment. | optional |
timeframe | timeframe | The timeframe of the series. | optional |
bins | positive integer | The number of buckets that should be created for the series. | optional |
interval | positive duration | The length of a bucket in the series. Must not be specified if bins is specified. Automatically calculated based on the number of bins. | optional |
by | list of expressions | The expressions the series should be split by. | optional |
spread | timeframe | A timeframe expression that provides the timeframe for the bucket calculation of the values in the series. | optional |
nonempty | boolean | Produces empty series when there is no data. | optional |
The following example counts the number of records per time interval.
data record(timestamp = toTimestamp("2019-08-01T09:30:00.000-0400")),record(timestamp = toTimestamp("2019-08-01T09:31:00.000-0400")),record(timestamp = toTimestamp("2019-08-01T09:31:30.000-0400")),record(timestamp = toTimestamp("2019-08-01T09:32:00.000-0400"))| makeTimeseries count(),from: toTimestamp("2019-08-01T09:30:00.000-0400"),to: toTimestamp("2019-08-01T09:33:00.000-0400")
Query result:
2019-08-01T13:30:00.000Z
2019-08-01T13:33:00.000Z
1 min
[1, 2, 1]
This example shows error logs for a host over time.
fetch logs| filter dt.entity.host == "HOST-02A6022668FCA28E" and loglevel == "ERROR"| makeTimeseries count()
The following example shows the buy activity for a specific account for the last 24 hours.
The interval
parameter sets the time interval to 30 minutes, resulting in 48 bins in the query outcome.
A bin with no buy activity has the value 0
because the default
parameter is 0
. Not setting this parameter results in NULL
values for the bins.
fetch bizevents, from: now() - 24h| filter accountId == 7| filter in(event.category, array("/broker-service/v1/trade/long/buy", "/v1/trade/buy"))| makeTimeseries count(default: 0), interval: 30m
The following advanced example takes sell transactions of the last seven days recorded as business events and charts the total number of transactions and high-volume transactions based on the conditional countIf()
function and the maximum price by accountId
. The interval
parameter defines a fixed chart resolution of 1 day.
fetch bizevents, from: now() - 7d| filter in(event.type, array("com.easytrade.long-sell", "com.easytrade.quick-sell"))| makeTimeseries {count(),high_volume = countIf(amount >= 100 and amount <= 10000),max(price)},by: { accountId },interval: 1d
The interval
and bins
parameters are exclusive. Both parameters are optional.
If specified, the bins
parameter (range: 12–1,500) is used to calculate an equivalent nominal time interval.
The nominal time interval, whether coming from the bins
or interval
parameter, is adjusted so that:
The makeTimeseries
command produces homogenous time series of aggregated data: all series have identical start and end timestamps, time interval and number of elements. If data is missing for a particular time slot, it is filled with null
. Specifying a default
parameter fills empty time slots with the default
parameter value instead of null
.
There might be no record processed by the makeTimeseries
command. For example, when no relevant data is available for the timeframe. The makeTimeseries
command returns an empty result in this case. If the expected result is a chart with 0 values, using nonempty: true
in combination with default: 0
produces the desired result.
fetch logs| filter status >= 500| makeTimeseries count = count(default: 0), interval: 30m
Query result:
timeframe | interval | count |
---|---|---|
No records |
fetch logs| filter status >= 500| makeTimeseries count = count(default: 0), interval: 30m, nonempty: true
Query result:
timeframe | interval | count |
---|---|---|
start: |
|
|
Nine aggregation functions are available to use with the makeTimeseries
command. These functions are: min
, max
, sum
, avg
, count
, countIf
, percentile
, countDistinctExact
and countDistincApprox
. It is also possible to use start
and end
with makeTimeseries
, however, they need to be used together with another aggregation function.
These functions have the following syntax:
sum(expression [, default] [, rate])
avg(expression [, default] [, rate])
min(expression [, default] [, rate])
max(expression [, default] [, rate])
count([default] [, rate])
countIf(expression [, default] [, rate])
countDistinctExact(expression [, default] [, rate])
countDistinctApprox(expression [, precision] [, default] [, rate])
start()
end()
percentile(expression, percentile [, default] [, rate])
Parameter | Type | Description | Required |
---|---|---|---|
expression | expression | The expression the series should be created for. | required |
default | number | The default value that should be used to fill gaps/empty bins. The default value is null. | optional |
rate | duration | The duration that shall be used to adjust the bin values using the following formula: (binValue / interval) * rate. | optional |
percentile | double, long | The nth-percentile, such that approximately n percent of observed measurements fall below this value. Must be between 0 and 100. | required |
precision | long | Parameter that sets the precision level of the estimation. Must be between 3 and 16. The default value is 14. | optional |
Groups together records that have the same values for a given field and aggregates them.
summarize [field =] aggregation, ... [, by: {[field =] expression, ...}]
Parameter | Type | Description | Required |
---|---|---|---|
expression | array, boolean, counter, double, duration, ip, long, record, string, timeframe, timestamp | An expression to group by. | required |
aggregation | array, boolean, counter, double, duration, ip, long, record, string, timeframe, timestamp | An aggregation function | required |
You can use different aggregation functions with the summarize
command. See all available aggregation functions.
The below example uses the summarize
command and the sum
aggregation function to sum the field value
.
data record(value = 2),record(value = 3),record(value = 7),record(value = 7),record(value = 1)| summarize sum(value)
Query result:
20
The following example calculates the sum of the field value
and splits the result by the field cat
. Furthermore, the resulting aggregation field gets the alias name sum
and the grouping field the alias category
.
data record(value = 2, cat = "a"),record(value = 3, cat = "b"),record(value = 7, cat = "a"),record(value = 7, cat = "b"),record(value = 1, cat = "b")| summarize sum = sum(value),by: { category = cat }
Query result:
a
9
b
11
If the summarize
command has no input records and you don't use the by
clause, the summarize
command will still return a single record a as result.
data record(value = 2),record(value = 3),record(value = 7),record(value = 7),record(value = 1)| filter value > 7| summarize count(), sum(value), collectArray(value), takeAny(value)
Query result:
0
If the grouped field does not exist for every record, the summarize command adds a null
group to the result.
data record(value = 2),record(value = 3, category = "b"),record(value = 7, category = "a"),record(value = 7, category = "b"),record(value = 1)| summarize sum(value),by: { category }
Query result:
a
7
b
10
3
You can use the summarize
command instead of the join
command when joining on the same table.
As input for the summarize
command, there are two types of records.
In this example, there are records with fields key
and value
and records with fields key
and amount
. The key
field is set as the by
parameter.
The other fields are added to the aggregation list.
If a record type has duplicate key values, use the aggregation function collectArray
for its fields.
Otherwise, use the takeAny
aggregation function for its record type fields. As the last step, you need to expand the fields that use the collectArray
function.
Use the summarize
command instead of the join
to avoid fetching the same table twice.
data record(key = "a", value = 1),record(key = "b", value = 2),record(key = "c", value = 4),record(key = "b", amount = 10),record(key = "c", amount = 20),record(key = "c", amount = 40),record(key = "d", amount = 50)| summarize {value = takeAny(value),amount = arrayRemoveNulls(collectArray(amount))},by: { key }| expand amount
Query result:
b
2
10
c
4
20
c
4
40
d
50
The following example uses 'summarize' with an iterative expression in the 'sum' aggregation function to calculate the element-wise sum of arrays in the input records.
data record(a = array(2, 2)),record(a = array(7, 1))| summarize sum(a[])
Query result:
[9, 3]
The following example counts selected log levels and groups them by dt.entity.host
and dy.entity.process_group
.
fetch logs| summarize {errors = countIf(loglevel == "ERROR"),warnings = countIf(loglevel == "WARN"),severe = countIf(loglevel == "DEBUG")},by: {dt.entity.host,dt.entity.process_group}