JSON_ARRAY
JSON syntax allows constructing arrays in a variety of ways. They can range from very simple primitive type members to very complex nested object members. A JSON array can contain members of a different type. And it is perfectly valid that they can appear also outside of a JSON object. JSON_ARRAY is meant to handle all of them.
output type
quantifier
configuration
variant_array (default) or array (depending on typed
configuration parameter setting)
none
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here ). The default locale is English.
typed = Boolean value, allowing to control JSON_ARRAY output type: "true" sets the output type to ARRAY, "false" sets output type to VARIANT_ARRAY. Default "false".
strict = Boolean value. "false" allows parsing arrays not following JSON specification. Unquoted JSON names and string values consisting of one word can be parsed. Default is "true". See parsing_json_nstrict.
maxlen = numeric value representing the maximum byte size of an array. Allows parsing large JSON arrays (exceeding default size of 128000 bytes).
The easiest is to use without any parameters. It will enumerate all array elements, transform them to Log processing data types from their declared Json type and returns parsed elements as variant_array object.
Consider the following array with different type elements:
[5,null,2.2,"17.9","5\u0037\u0037\u0037\u0037\u00372"]
Pattern:
JSON_ARRAY:array
Resulting variant_array holds all parsed array elements:
array[0]
5
VARIANT<LONG>
array[1]
null
VARIANT
array[2]
2.2
VARIANT<DOUBLE>
array[3]
17.9
VARIANT<STRING>
array[4]
5777772
VARIANT<STRING>
JSON_ARRAY{ type }
Converts all array elements to explicitly set type.
When conversion fails the entire output will be set to NULL.
Configuration parameter typed=true
will change the output to array.
Conversion to explicit types allows you to use array expressions in accessing array elements in query statements (variant types require using casting-functions).
[223423,-343.8e7,null,"3.14"]
Following pattern attempts converting all array elements to double, with output set to array:
JSON_ARRAY{ DOUBLE }(typed=true):double_arr
The result (double-click on the resultset row to see the details).
double_arr[0]
223423.0
DOUBLE
double_arr[1]
-3.438E9
DOUBLE
double_arr[2]
NULL
DOUBLE
double_arr[3]
3.14D
DOUBLE
Sometimes you may encounter datasets, where Json formatted records are members of an array. Attempting to parse such datasets using JSON_ARRAY makes no sense because:
Instead, you should treat such datasets as a list of JSON objects separated by a comma and ignore the square brackets of enclosing array.
It is highly recommended using JSON object semantic validation to define the array member object. This avoids unmatched objects near the border of chunks of input data. 1
Consider an array with a large number of JSON objects as elements:
[{"id":"1","a":[3,6],"b":{"foo":"bar"}},{"id":"2","a":[30,2673,1]},{"id":"256000","a":[256,1193,2],"b":{"foo":"nobar"}}]
Following pattern parses Json object records by ignoring the encapsulating array.
(BOF '[')?JSON_OBJECT{INT+:id}(greedy='others'):record ','?(']' EOF)?
where:
record
, followed by a comma (optional since the last object has no trailing comma). Note that the id
is made mandatory to avoid data loss 1. Other members are extracted automatically into others
field.Result:
{id=1 others={'a':[3,6],'b':{'foo':'bar'}}}
{id=2 others={'a':[30,2673,1]}}
{id=256000 others={'a':[256,1193,2],'b':{'foo':'nobar'}}}