The XML
matcher parses XML documents, creating a structure that adapts to the content of the XML document.
The following mapping rules apply when using the XML
matcher:
@
as a prefix in the field name.
For example, attribute <... attr="...">
becomes field @attr
.#text
field.#text
field becomes an array when mixing child elements (or comments) with multiple text content parts.<prefix:element>
becomes field prefix:element
.Alternatively, you can use the XML_PLAIN
or XML_VERBOSE
matchers with a fixed output structure.
output type
quantifier
configuration
variant_object
none
rootTag = string specifying the name of the XML element from which to parse content. The first occurrence is selected if multiple such elements exist. By default, the matcher parses the whole XML document.
excludeRoot = Boolean value. "true" excludes the root element from the output. The default is false
.
maxlen = numeric value representing the maximum byte size of an XML document. Allows parsing large XML documents (exceeding default size of 128000 bytes).
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list). The default locale is English.
An XML document:
<?xml version="1.0" encoding="UTF-8"?><messages xmlns:xhtml="http://www.w3.org/1999/xhtml"><thread id="1"><topic>XML Parsing</topic><!-- comment --><message id="101"><sender>Alice</sender><content type="plain"><b>text</b></content></message><message id="102"><sender>Bob</sender><content type="plain"><![CDATA[<b>text</b>]]></content></message><message id="103"><sender>John</sender><content type="xhtml">More <xhtml:b>text</xhtml:b> here.</content></message><message id="104"><sender>Mary</sender><content type="xhtml">Some <!-- hidden text --> included.</content></message></thread></messages>
Can be parsed using the pattern:
XML:xml
The result is:
name | value | type |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
With XML_PLAIN
, you get a streamlined version of the XML data. The matcher discards attributes of XML elements. It can be helpful for cases where this information is unnecessary since it reduces the output structure's complexity, making it easier to work with the parsed data.
XML_PLAIN
uses the following mapping rules:
<prefix:element>
becomes field prefix:element
.output type
quantifier
configuration
variant_object
none
rootTag = string specifying the name of the XML element from which to parse content. The first occurrence is selected if multiple such elements exist. By default, the matcher parses the whole XML document.
excludeRoot = Boolean value. "true" excludes the root element from the output. The default is false
.
maxlen = numeric value representing the maximum byte size of an XML document. Allows parsing large XML documents (exceeding default size of 128000 bytes).
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
An XML document:
<?xml version="1.0" encoding="UTF-8"?><messages xmlns:xhtml="http://www.w3.org/1999/xhtml"><thread id="1"><topic>XML Parsing</topic><!-- comment --><message id="101"><sender>Alice</sender><content type="plain"><b>text</b></content></message><message id="102"><sender>Bob</sender><content type="plain"><![CDATA[<b>text</b>]]></content></message><message id="103"><sender>John</sender><content type="xhtml">More <xhtml:b>text</xhtml:b> here.</content></message><message id="104"><sender>Mary</sender><content type="xhtml">Some <!-- hidden text --> included.</content></message></thread></messages>
Can be parsed using the pattern:
XML_PLAIN:xml
The result is:
name | value | type |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You receive the most detailed and comprehensive data structure when parsing XML documents using the XML_VERBOSE
matcher. In contrast to the XML
matcher, the XML_VERBOSE
matcher creates an output structure that is fixed and does not depend on the presence of element attributes or child elements.
XML_VERBOSE
uses the following mapping rules:
@
as a prefix in the field name.
For example, attribute <... attr="...">
becomes field @attr
.#text
field.#text
field becomes an array when mixing child elements (or comments) with multiple text content parts.<prefix:element>
becomes field prefix:element
.output type
quantifier
configuration
variant_object
none
rootTag = string specifying the name of the XML element from which to parse content. The first occurrence is selected if multiple such elements exist. By default, the matcher parses the whole XML document.
excludeRoot = Boolean value. "true" excludes the root element from the output. The default is false
.
maxlen = numeric value representing the maximum byte size of an XML document. Allows parsing large XML documents (exceeding default size of 128000 bytes).
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
An XML document:
<?xml version="1.0" encoding="UTF-8"?><messages xmlns:xhtml="http://www.w3.org/1999/xhtml"><thread id="1"><topic>XML Parsing</topic><!-- comment --><message id="101"><sender>Alice</sender><content type="plain"><b>text</b></content></message><message id="102"><sender>Bob</sender><content type="plain"><![CDATA[<b>text</b>]]></content></message><message id="103"><sender>John</sender><content type="xhtml">More <xhtml:b>text</xhtml:b> here.</content></message><message id="104"><sender>Mary</sender><content type="xhtml">Some <!-- hidden text --> included.</content></message></thread></messages>
Can be parsed using the pattern:
XML_VERBOSE:xml
The result is:
name | value | type |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|