Data source commands

data

The data command generates sample data during query runtime. It is intended to test and document query scenarios based on a small, exemplary dataset.

  • Based on an input according to the DQL record datatype, or passing a valid JSON string, a tabular list of records is returned.

  • The data command is a starting command which can be used without a pipeline input.

Syntax

data [ records ] [, json: json_string ]

Parameters

Parameter

Type

Description

Required

records

record expressions

A list of record expressions. Either records or JSON has to be specified.

optional

json

string

A string that defines either a single JSON object or a JSON array. Either records or JSON has to be specified.

optional

Basic examples

Example 1: Create records

In this example, the data command generates three heterogeneous records.

data record(a = "DQL", b = 1, c = 0),
record(a = "Dynatrace Query Language", b = 2.9, e = "1"),
record()

Query result:

a
b
c
e
DQL
1
0
null
Dynatrace Query Language
2.9
null
1
null
null
null
null
Example 2: Create records from json

The following example generates records based on a JSON input. The use of triple double quotes (""") is intentional: in multiline strings, a string surrounded by triple double quotes respects new lines, and you don't need to escape double or single quotes inside the string.

data json:"""[
{
"amount": 1152,
"accountId": 12
},
{
"amount": 709,
"accountId": 96
}
]"""

Query result:

amount
accountId
1,152
12
709
96

describe

Describes the on-read schema extraction definition for a given data object. It returns the specified fields and their consecutive datatypes. The on-read schema extraction in Grail ensures that every record returned by querying the data of a data object via the fetch command will contain at least those fields.

Known fields: Fields specified for a data object and returned by the describe command or by a DQL statement.

Unknown/dynamic fields: Any ingested field not part of the on-read schema extraction definition for a given data object. The field name and datatype are derived at runtime when using a field within a DQL statement.

Syntax

describe dataObject

Basic example

Example: Describe business events

The following example uses the describe command to retrieve information about all known fields for the bizevents data object.

describe bizevents

Query result:

field
data_types
dt.system.table
[string]
dt.system.environment
[string]
dt.system.bucket
[string]
dt.system.segment_id
[string]
timestamp
[timestamp]
dt.system.sampling_ratio
[long]

fetch

Loads data from the specified resource.

Syntax

fetch dataObject [,timeframe:] [,from:] [,to:] [,samplingRatio:] [,scanLimitGBytes:]

Basic examples

Example 1: Query logs

Here is an example of the fetch command in its simplest form.

fetch logs

Relative query timeframes

All duration literals valid for the duration data type are applicable for the from: and to: parameters.

This example with relative time ranges uses DQL time literals to query logs from the last 25 minutes:

  • On the UI level: in the timeframe selector in the upper-right corner:

    • To choose one of the existing values (for example last 72 hours or last 365 days), select Presets
    • To create your own timeframe value, select Custom
    • To select the last 2 hours, select Recent
  • On the query level

Example 2: Query relative timeframe

This example with relative time ranges uses DQL's time literals to set the time frame to query logs with the optional from and to parameters.

fetch logs, from: -24h, to: -2h
Example 3: Query with absolute timeframe

You can also use absolute time ranges with the timeframe parameter.

fetch logs, timeframe: "2021-10-20T00:00:00Z/2021-10-28T12:00:00Z"

Sampling

Currently, to improve query performance, sampling is applicable for Log data within the initial fetch pipeline stage. Sampling happens vertically across the data, resulting in the selection of a subset of Log records, according to the specified, optional samplingRatio parameter.

The applicable value ranges for sampling are:

  • 1: Default value, resulting in no applied sampling.
  • 10
  • 100
  • 1000
  • 10000

Depending on the specified value, 1/<samplingRatio> of available raw Log records are returned.

The selected samplingRatio is reported in the query result for each record through dt.system.sampling_ratio, which is a hidden field. To see the hidden field, you need to select it via the fields command.

Sampling in practice

Sampling is non-deterministic, and will return a different result set with each query run. Also, all the following commands will work based on the sampled set of input data, yielding unprecise aggregates.

Furthermore, result sets may vary greatly with different samplingRatio values. This is the nature of sampling, as a high sampling ratio is more likely to leave out low-frequency logs. For example, if you had one ERROR log among millions of INFO logs, filter loglevel == "ERROR" would very likely return an empty result set for any sampled data.

Example 4: Sampling ratio

The following example estimates the occurrences of ERROR logs across the last 7 days.

  • The fetch command's samplingRatio parameter defines the sampling ratio.
  • The summarize command, combined with the countIf function, counts only error logs.
  • You need to multiply the count with the sampling ratio to get an estimation.
fetch logs, from: -7d, samplingRatio: 100
| summarize c = countIf(loglevel == "ERROR") * takeAny(dt.system.sampling_ratio)

Read data limit

The optional scanLimitGBytes parameter controls the amount of uncompressed data to be read by the fetch stage. The default value is 500GB unless specified otherwise. If set to -1, all data available in the query time range is analyzed.