Aggregation functions compute results from a list of records.
You can use the aggregation functions listed on this page with the summarize command. See the description of the timeseries command and the makeTimeseries command for the aggregation functions available to use with those commands.
The below table explains the results of combining homogeneous data types in the aggregation function, for example the avg()
function for two numeric expressions.
In/out | double | long | double & long | duration | timestamp | boolean | string | timeframe | record | array |
---|---|---|---|---|---|---|---|---|---|---|
Numeric | Calculables | Clear ordering | Ambiguous ordering | |||||||
count() | long | long | long | long | long | long | long | long | long | long (number of arrays) |
countif() | long | long | long | long | long | long | long | long | long | long (number of arrays) |
sum() | double | double | double | duration |
|
|
|
|
|
|
avg() | double | double | double | duration |
|
|
|
|
|
|
correlation() | double | double | double |
|
|
|
|
|
|
|
stddev | double | double | double |
|
|
|
|
|
|
|
variance() | double | double | double |
|
|
|
|
|
|
|
last() | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input |
first() | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input |
General rules:
null
, unless you mix data for which combinations are allowed, such as long
and double
.null
result for operations not covered by a given function, for example the sum()
of two boolean
expressions.Function-specific rules:
sum function
allows numeric expressions and duration expressions. If you mix types, the result is null
.
double
data type (for example, double + double
, double + long
, long + long
).duration
data type.null
.avg function
allows numeric expressions and duration expressions. If you mix types, the result is null
.
double
data type (for example, double + double
, double + long
, long + long
).duration
data type.null
.min
and max
functions allow numeric expressions, duration expressions, timestamp expressions, string expressions, and boolean expressions.
null
.takeFirst
and the takeLast
functions allow expressions of all types.
Calculates the average value of a field for a list of records.
avg(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | double, long, duration, iterative expression | The expression whose average is to be calculated. | required |
The data type of the returned value can be double
or duration
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize avg(a)
Query result:
avg(a) |
---|
|
Collects the values of the provided field into an array. The original order of elements is not guaranteed.
collectArray(expression [, expand] [, maxLength])
Parameter | Type | Description | Required |
---|---|---|---|
expression | array, boolean, double, duration, ip, long, record, string, timeframe, timestamp | The expression whose values are to be collected into an array. | required |
expand | boolean | The boolean expression that indicates whether the output should be a flat array. | optional |
maxLength | long | The maximum length of the resulting array. Must be between | optional |
The data type of the returned value is array
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize collectArray(a),collectArray(a, maxLength: 3)
Query result:
collectArray(a) | collectArray(a, maxLength:3) |
---|---|
|
|
data record(a = 2),record(a = array(3, 7)),record(a = array(7)),record(a = array(array(1)))| summarize collectArray(a, expand: false),collectArray(a, expand: true)
Query result:
collectArray(a, expand:FALSE) | collectArray(a, expand:TRUE) |
---|---|
|
|
Collects distinct values of the provided field into an array. The original order of elements is not guaranteed.
collectDistinct(expression [, expand] [, maxLength])
Parameter | Type | Description | Required |
---|---|---|---|
expression | array, boolean, double, duration, ip, long, record, string, timeframe, timestamp | The expression whose distinct values are to be collected. | required |
expand | boolean | The boolean expression that indicates whether the output should be a flat array. | optional |
maxLength | long | The maximum length of the resulting array. Must be between | optional |
The data type of the returned value is array
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize collectDistinct(a),collectDistinct(a, maxLength: 3)
Query result:
collectDistinct(a) | collectDistinct(a, maxLength:3) |
---|---|
|
|
data record(a = 2),record(a = array(3, 7)),record(a = array(7)),record(a = array(array(1)))| summarize collectDistinct(a, expand: false),collectDistinct(a, expand: true)
Query result:
collectDistinct(a, expand:FALSE) | collectDistinct(a, expand:TRUE) |
---|---|
|
|
Calculates the Pearson correlation of two numeric fields for a list of records. If one of the fields has a constant value, the covariance of both fields used for correlation is zero. In this case, the correlation coefficient causes a division by zero, yielding null
for the correlation.
correlation(expression1, expression2)
Parameter | Type | Description | Required |
---|---|---|---|
expression1 | double, long | The first numeric expression to be used in calculating the correlation. | required |
expression2 | double, long | The second numeric expression to be used in calculating the correlation. | required |
The data type of the returned value is double
.
data record(a = 2, b = 14.55),record(a = 3, b = -6.13),record(a = 7, b = -77.2),record(a = 7, b = 99.99),record(a = 1, b = -6.13)| summarize correlation(a, b)
Query result:
correlation(a, b) |
---|
|
Counts the total number of records.
count()
The data type of the returned value is long
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize count()
Query result:
count() |
---|
|
This function is an alias for the countDistinctApprox
function. For more information, see countDistinctApprox
.
Calculates the cardinality of unique values of a field for a list of records based on a stochastic estimation. The stochastic estimation relies on the UltraLogLog (ULL) algorithm, which is characterized by a guaranteed error rate as defined by the ULL algorithm. The UltraLogLog (ULL) sketch is a variant of HyperLogLog and is used for approximate distinct counts. Compared to the HyperLogLog sketch, the UltraLogLog sketch requires less space to achieve the same estimation error.
The precision parameter affects the relative standard error of the final estimation. The formula for getting the standard relative errors is 0.782/((2^precision)^(1/2))
. Note that, this is the standard error (expected or average error), this means that the estimation error can be greater than the calculated one.
countDistinctApprox(expression [, precision])
Parameter | Type | Description | Required |
---|---|---|---|
expression | any | The field or expression whose unique values are to be counted. | required |
precision | long | Parameter that sets the precision level of the estimation. | optional |
The data type of the returned value is long
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize countDistinctApprox(a),countDistinctApprox(a, precision: 4)
Query result:
countDistinctApprox(a) | countDistinctApprox(a, precision:4) |
---|---|
|
|
Calculates the cardinality of unique values of a field for a list of records.
This function counts up to 1M distinct values and if exceeded the query will fail. The function issues a warning once it reaches 100k distinct values.
countDistinctExact(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | any | The expression whose distinct values are to be counted. | required |
The data type of the returned value is long
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize countDistinctExact(a)
Query result:
countDistinctExact(a) |
---|
|
Counts the number of records that match the condition.
countIf(condition)
Parameter | Type | Description | Required |
---|---|---|---|
condition | boolean | The expression that determines which records to count. | required |
The data type of the returned value is long
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize countIf(a > 2)
Query result:
countIf(a > 2) |
---|
|
Calculates the maximum value of a field for a list of records.
max(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | double, long, duration, timestamp, string, boolean, iterative expression | The expression whose maximum value is to be calculated. | required |
The data type of the returned value can be double
, long
, timestamp
, duration
, string
or boolean
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize max(a)
Query result:
max(a) |
---|
|
Calculates the median of an expression (short for percentile(expression, 50))
median(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | double, long, duration, timestamp, boolean | The expression whose median value is to be calculated. | required |
The data type of the returned value can be boolean
, double
, duration
or timestamp
.
data record(a = 516),record(a = 78),record(a = 12),record(a = 763),record(a = 13),record(a = 784),record(a = 284),record(a = 470),record(a = 3201),record(a = 17)| summarize median(a)
Query result:
median(a) |
---|
|
Calculates the minimum value of a field for a list of records.
min(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | double, long, duration, string, timestamp, boolean, iterative expression | The expression whose minimum value is to be calculated. | required |
The data type of the returned value can be double
, long
, timestamp
, duration
, string
or boolean
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize min(a)
Query result:
min(a) |
---|
|
Calculates a given percentile of an expression.
percentile(expression, percentile)
Parameter | Type | Description | Required |
---|---|---|---|
expression | double, long, duration, timestamp, boolean | The expression from which to compute a percentile. | required |
percentile | double, long | The percentile to compute (between 0 and 100). | required |
The data type of the returned value can be double
, boolean
, duration
or timestamp
.
data record(a = 516),record(a = 78),record(a = 12),record(a = 763),record(a = 13),record(a = 784),record(a = 284),record(a = 470),record(a = 3201),record(a = 17)| summarize percentile(a, 90)
Query result:
percentile(a, 90) |
---|
|
Calculates the standard deviation of a field for a list of records.
stddev(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | double, long | The expression whose standard deviation is to be calculated. | required |
The data type of the returned value is double
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize stddev(a)
Query result:
stddev(a) |
---|
|
Calculates the sum of a field for a list of records.
sum(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | double, duration, long | The expression that specifies which values are to be added together. | required |
The data type of the returned value can be double
or duration
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize sum(a)
Query result:
sum(a) |
---|
|
Returns any non-null value of a field for a list of records.
takeAny(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | any | The expression from which to retrieve any non-null value. | required |
The data type of the returned value can be array
, binary
, boolean
, double
, duration
, ip
, long
, record
, string
, timeframe
, timestamp
or uid
.
data record(a = null),record(a = 3),record(a = 7),record(a = 5),record(a = null)| summarize takeAny(a)
Query result:
takeAny(a) |
---|
|
Returns the first value of a field for a list of records.
takeFirst(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | any | The expression from which the first value is retrieved. | required |
The data type of the returned value can be array
, binary
, boolean
, double
, duration
, ip
, long
, record
, string
, timeframe
, timestamp
or uid
.
data record(a = null),record(a = 3),record(a = 7),record(a = 5),record(a = null)| summarize takeFirst(a)
Query result:
takeFirst(a) |
---|
|
Returns the last value of a field for a list of records.
takeLast(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | any | The expression from which the last value is retrieved. | required |
The data type of the returned value can be array
, binary
, boolean
, double
, duration
, ip
, long
, record
, string
, timeframe
, timestamp
or uid
.
data record(a = null),record(a = 3),record(a = 7),record(a = 5),record(a = null)| summarize takeLast(a)
Query result:
takeLast(a) |
---|
|
Retrieves the maximum from a list of records. Renders results for lists with both homogenous and non-homogenous data.
takeMax(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | any | The expression from which the maximum value is to be extracted. | required |
The data type of the returned value can be array
, binary
, boolean
, double
, duration
, ip
, long
, record
, string
, timeframe
, timestamp
or uid
.
data record(a = 2, b = 2),record(a = 3, b = array(1, 2, 3)),record(a = 7, b = ("2019-08-01T09:30:00.000-0400")),record(a = 7, b = "DQL is awesome!"),record(a = 1, b = 5m)| summarize takeMax(a),max(a),takeMax(b),max(b)
Query result:
takeMax(a) | max(a) | takeMax(b) | max(b) |
---|---|---|---|
|
|
| null |
Returns the minimum value of a field for a list of records.
takeMin(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | any | The expression from which the minimum value is to be extracted. | required |
The data type of the returned value can be array
, binary
, boolean
, double
, duration
, ip
, long
, record
, string
, timeframe
, timestamp
or uid
.
data record(a = 2, b = 2),record(a = 3, b = array(1, 2, 3)),record(a = 7, b = ("2019-08-01T09:30:00.000-0400")),record(a = 7, b = "DQL is awesome!"),record(a = 1, b = 5m)| summarize takeMin(a),min(a),takeMin(b),min(b)
Query result:
takeMin(a) | min(a) | takeMin(b) | min(b) |
---|---|---|---|
|
|
| null |
Calculates the variance of a field for a list of records.
variance(expression)
Parameter | Type | Description | Required |
---|---|---|---|
expression | double, long | The expression from which the minimum value is to be extracted. | required |
The data type of the returned value is double
.
data record(a = 2),record(a = 3),record(a = 7),record(a = 7),record(a = 1)| summarize variance(a)
Query result:
variance(a) |
---|
|