DPL JSON Arrays

JSON_ARRAY

JSON syntax allows constructing arrays in a variety of ways. They can range from very simple primitive type members to very complex nested object members. A JSON array can contain members of a different type. And it is perfectly valid that they can appear also outside of a JSON object. JSON_ARRAY is meant to handle all of them.

output type

quantifier

configuration

variant_array (default) or array (depending on typed configuration parameter setting)

none

charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1")

locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here ). The default locale is English.

typed = Boolean value, allowing to control JSON_ARRAY output type: "true" sets the output type to ARRAY, "false" sets output type to VARIANT_ARRAY. Default "false".

strict = Boolean value. "false" allows parsing arrays not following JSON specification. Unquoted JSON names and string values consisting of one word can be parsed. Default is "true". See parsing_json_nstrict.

maxlen = numeric value representing the maximum byte size of an array. Allows parsing large JSON arrays (exceeding default size of 128000 bytes).

The easiest is to use without any parameters. It will enumerate all array elements, transform them to Log processing data types from their declared Json type and returns parsed elements as variant_array object.

Example

Consider the following array with different type elements:

[5,null,2.2,"17.9","5\u0037\u0037\u0037\u0037\u00372"]

Pattern:

JSON_ARRAY:array

Resulting variant_array holds all parsed array elements:

name
value
type
array[0]
5
VARIANT<LONG>
array[1]
null
VARIANT
array[2]
2.2
VARIANT<DOUBLE>
array[3]
17.9
VARIANT<STRING>
array[4]
5777772
VARIANT<STRING>

Controlling Output Type and Conversion

JSON_ARRAY{ type }

Converts all array elements to explicitly set type.

When conversion fails the entire output will be set to NULL.

Configuration parameter typed=true will change the output to array.

Conversion to explicit types allows you to use array expressions in accessing array elements in query statements (variant types require using casting-functions).

Example

[223423,-343.8e7,null,"3.14"]

Following pattern attempts converting all array elements to double, with output set to array:

JSON_ARRAY{ DOUBLE }(typed=true):double_arr

The result (double-click on the resultset row to see the details).

name
value
type
double_arr[0]
223423.0
DOUBLE
double_arr[1]
-3.438E9
DOUBLE
double_arr[2]
NULL
DOUBLE
double_arr[3]
3.14D
DOUBLE

Parsing Large Arrays

Sometimes you may encounter datasets, where Json formatted records are members of an array. Attempting to parse such datasets using JSON_ARRAY makes no sense because:

  • you want to get records as rows in a resultset, but JSON_ARRAY returns one array object in one resultset row
  • the number of records most likely exceeds the capacity of JSON_ARRAY (even when using maxlen configuration parameter)

Instead, you should treat such datasets as a list of JSON objects separated by a comma and ignore the square brackets of enclosing array.

It is highly recommended using JSON object semantic validation to define the array member object. This avoids unmatched objects near the border of chunks of input data. 1

Example

Consider an array with a large number of JSON objects as elements:

[{"id":"1","a":[3,6],"b":{"foo":"bar"}},{"id":"2","a":[30,2673,1]},{"id":"256000","a":[256,1193,2],"b":{"foo":"nobar"}}]

Following pattern parses Json object records by ignoring the encapsulating array.

(BOF '[')?
JSON_OBJECT{INT+:id}(greedy='others'):record ','?
(']' EOF)?

where:

  1. Matches opening square brackets at the beginning of the file. The sequence is made optional, to allow pattern match for bytes in the middle of a file.
  2. Extracts Json object to resultset field record, followed by a comma (optional since the last object has no trailing comma). Note that the id is made mandatory to avoid data loss 1. Other members are extracted automatically into others field.
  3. Matches closing brackets at the end of a file. The sequence is made optional, to allow pattern match for bytes in the middle of a file.

Result:

record
{id=1 others={'a':[3,6],'b':{'foo':'bar'}}}
{id=2 others={'a':[30,2673,1]}}
{id=256000 others={'a':[256,1193,2],'b':{'foo':'nobar'}}}
1