Aggregation functions
Aggregation functions compute results from a list of records.
The below table explains the results of combining homogeneous data types in the aggregation function, for example the avg()
function for two numeric expressions.
In/out | double | long | double & long | duration | timestamp | boolean | string | timeframe | record | array |
---|---|---|---|---|---|---|---|---|---|---|
Numeric | Calculables | Clear ordering | Ambiguous ordering | |||||||
count() | long | long | long | long | long | long | long | long | long | long (number of arrays) |
countif() | long | long | long | long | long | long | long | long | long | long (number of arrays) |
sum() | double | double | double | duration |
|
|
|
|
|
|
avg() | double | double | double | duration |
|
|
|
|
|
|
correlation() | double | double | double |
|
|
|
|
|
|
|
stddev | double | double | double |
|
|
|
|
|
|
|
variance() | double | double | double |
|
|
|
|
|
|
|
last() | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input |
first() | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input | same as input |
General rules:
- If you mix two data types, the result is
null
, unless you mix data for which combinations are allowed, such aslong
anddouble
. - You will also get the
null
result for operations not covered by a given function, for example thesum()
of twoboolean
expressions.
Function-specific rules:
- The
sum function
allows numeric expressions and duration expressions. If you mix types, the result isnull
.- The sum of two numeric expressions results in a
double
data type (for example,double + double
,double + long
,long + long
). - The sum of two duration expressions results in a
duration
data type. - The sum of a numeric expression and a duration expression is
null
.
- The sum of two numeric expressions results in a
- The
avg function
allows numeric expressions and duration expressions. If you mix types, the result isnull
.- The average of two numeric expressions results in a
double
data type (for example,double + double
,double + long
,long + long
). - The average of two duration expressions results in a
duration
data type. - The average of a numeric expression and a duration expression is
null
.
- The average of two numeric expressions results in a
- The
min
andmax
functions allow numeric expressions, duration expressions, timestamp expressions, string expressions, and boolean expressions.The minimum/maximum of numerical expressions results in a double data type, apart from min/max of long expressions that results in a long data type.
- The minimum/maximum of any mixed types (other than double + long) is
null
. For strings, lexicographic ordering is used.
For the boolean expressions, false < true.
- The
first
and thelast
functions allow expressions of all types.The first function selects the first non-null value (and the data type retrieved is the one of that value) within the existing order.
The last function selects the last non-null value (and the data type retrieved is the one of that value) within the existing order.
avg
Calculates the average value of a field for a list of records.
-
Syntax
avg(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
double, long, duration
yes
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize avg(a)Query result
avg(a) 4
collectArray
Collects the values of the provided field into an array (preservation of order not guaranteed).
-
Syntax
collectArray(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
array, boolean, double, duration, ip, long, record, string, timeframe, timestamp
yes
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize collectArray(a)Query result
collectArray(a) [2, 3, 7, 7, 1]
collectDistinct
Collects the values of the provided field into an array. The original order of elements is not guaranteed.
-
Syntax
collectDistinct(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
array, boolean, double, duration, ip, long, record, string, timeframe, timestamp
yes
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize collectDistinct(a)Query result
collectDistinct(a) [2, 3, 7, 1]
correlation
Calculates the Pearson correlation of two numeric fields for a list of records. If one of the fields has a constant value, the covariance of both fields used for correlation is zero. In this case, the correlation coefficient causes a division by zero, yielding null
for the correlation.
-
Syntax
correlation(expression1, expression2)
-
Parameters
Name Type Mandatory Default Constraints Description expression1
double, long
yes
expression2
double, long
yes
-
Example
1data record(a = 2, b = 14.55),2 record(a = 3, b = -6.13),3 record(a = 7, b = -77.2),4 record(a = 7, b = 99.99),5 record(a = 1, b = -6.13)6| summarize correlation(a, b)Query result
correlation(a, b) 0.0888
count
Counts the total number of records.
-
Syntax
count()
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize count()Query result
count() 5
countDistinct
Calculates the cardinality of unique values of a field for a list of records. In comparison to SQL, | summarize countDistinct(x)
would relate to SELECT count(distinct(x))
.
-
Syntax
countDistinct (Expression, [precision])
-
Parameters
Name Type Mandatory Default Constraints Description expression
array, boolean, double, duration, ip, long, record, string, timestamp
yes
The precision that should be used for the HLL implementation in the interval [1, 16].
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize countDistinct(a),7 countDistinct(a, precision: 4)Query result
countDistinct(a) countDistinct(a, precision:4) 4
3
countIf
Counts the number of records that match the condition.
-
Syntax
countIf(condition)
-
Parameters
Name Type Mandatory Default Constraints Description condition
boolean
yes
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize countIf(a > 2)Query result
countIf(a > 2) 3
max
Calculates the maximum value of a field for a list of records.
-
Syntax
max(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
double, long, duration, timestamp, string, boolean
yes
expression
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize max(a)Query result
max(a) 7
median
Calculates the median of an expression (short for percentile(expression, 50))
-
Syntax
median(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
double, long, duration, timestamp, boolean
yes
Returns NULL if input contains mixed data types.
No nested aggregation functions.
The expression from which to compute the median.
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize median(a)Query result
median(a) 3
min
Calculates the minimum value of a field for a list of records.
-
Syntax
min(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
double, long, duration, timestamp, string, boolean
yes
expression
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize min(a)Query result
min(a) 1
percentile
Calculates a given percentile of an expression.
-
Syntax
percentile(expression, percentile)
-
Parameters
Name Type Mandatory Default Constraints Description expression
double, long
yes
Returns NULL if input contains mixed data types.
No nested aggregation functions.
The expression from which to compute a percentile.
percentile
double, long
yes
Must be between 0 and 100.
No nested aggregation functions.
The percentile to be calculated from the expression.
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize percentile(a, 90)Query result
percentile(a, 90) 7
stddev
Calculates the standard deviation of a field for a list of records.
-
Syntax
stddev(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
double, long
yes
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize stddev(a)Query result
stddev(a) 2.5298
sum
Calculates the sum of a field for a list of records.
-
Syntax
sum(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
double, long, duration
yes
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize sum(a)Query result
sum(a) 20
takeAny
Returns any non-null value of a field for a list of records.
-
Syntax
takeAny(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
array, boolean, double, duration, ip, long, record, string, timeframe, timestamp
yes
-
Example
1data record(a = null),2 record(a = 3),3 record(a = 7),4 record(a = 5),5 record(a = null)6| summarize takeAny(a)Query result
takeAny(a) 3
takeFirst
Returns the first value of a field for a list of records.
-
Syntax
takeFirst(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
array, boolean, double, duration, ip, long, record, string, timeframe, timestamp
yes
-
Example
1data record(a = null),2 record(a = 3),3 record(a = 7),4 record(a = 5),5 record(a = null)6| summarize takeFirst(a)Query result
takeFirst(a) 3
takeLast
Returns the last value of a field for a list of records.
-
Syntax
takeLast(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
array, boolean, double, duration, ip, long, record, string, timeframe, timestamp
yes
-
Example
1data record(a = null),2 record(a = 3),3 record(a = 7),4 record(a = 5),5 record(a = null)6| summarize takeLast(a)Query result
takeLast(a) 5
takeMax
Retrieves the maximum from a list of records. Renders results for lists with both homogenous and non-homogenous data.
-
Syntax
takeMax(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
any
yes
No nested aggregation functions.
The expression used in the max calculation.
-
Example
1data record(a = 2, b = 2),2 record(a = 3, b = array(1, 2, 3)),3 record(a = 7, b = ("2019-08-01T09:30:00.000-0400")),4 record(a = 7, b = "DQL is awesome!"),5 record(a = 1, b = 5m)6| summarize takeMax(a),7 max(a),8 takeMax(b),9 max(b)Query result
takeMax(a) max(a) takeMax(b) max(b) 7
7
[1, 2, 3]
null
takeMin
Returns the minimum value of a field for a list of records.
-
Syntax
takeMin(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
any
yes
No nested aggregation functions.
The expression used in the min calculation.
-
Example
1data record(a = 2, b = 2),2 record(a = 3, b = array(1, 2, 3)),3 record(a = 7, b = ("2019-08-01T09:30:00.000-0400")),4 record(a = 7, b = "DQL is awesome!"),5 record(a = 1, b = 5m)6| summarize takeMin(a),7 min(a),8 takeMin(b),9 min(b)Query result
takeMin(a) min(a) takeMin(b) min(b) 1
1
2
null
variance
Calculates the variance of a field for a list of records.
-
Syntax
variance(expression)
-
Parameters
Name Type Mandatory Default Constraints Description expression
double, long
yes
-
Example
1data record(a = 2),2 record(a = 3),3 record(a = 7),4 record(a = 7),5 record(a = 1)6| summarize variance(a)Query result
variance(a) 6.4