Lines and Strings
Line Breaks
EOL; LF
Matches the single line feed character (ASCII 0xa).
output type
quantifier
configuration
none
default value: {1,1}
- requires matching minimum 1 and maximum 1 bytes
none
CR
Matches single carriage return character (ASCII 0xd).
output type
quantifier
configuration
none
default value: {1,1}
- requires matching minimum 1 and maximum 1 bytes
none
EOLWIN
Matches two characters: line feed followed by the carriage return (ASCII 0xd 0xa)
output type
quantifier
configuration
none
default value: {2,2}
- requires matching minimum 2 and maximum 2 bytes
none
Line Data
LD, LDATA
Matches any characters until the next non-optional matcher in the scope of a line.
LD must always be followed by a non-optional matcher expression
output type
quantifier
configuration
string
default value: {1,4096}
- requires matching minimum 1 and maximum 4096 bytes
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
Example 1
Parsing lines ending with single line-feed character (i.e a *NIX text file). Note that line endings are marked with 'n' for clarity:
Red fox jumps over lazy dog\n\nThe end\n
The following pattern extracts the content of the entire line - i.e matches any character until the line-feed:
LD:line EOL
Results in the first and third lines being parsed out. The second line fails to parse because LD matcher encountered an end of line before required minimum matching count (default 1):
Red fox jumps over lazy dog
The end
Example 2
Extracting username field from a CSV file:
2016-01-03 00:13:28,110.188.4.216,forerequest,2002016-01-06 06:35:24,48.242.116.66,unrioting,2002016-01-05 11:49:01,223.11.158.94,ribassano,404
TIMESTAMP:date_time ','IPADDR:ip ','LD:username ','LD EOL;
where:
- extracts time and date, followed by field separator ','
- extracts ip-address, followed by field separator ','
- extracts username by matching any characters until field separator ','
- matches but does not extract any character for the rest of the line
Results:
2016-01-03 00:13:28 +0000
110.188.4.216
forerequest
2016-01-06 06:35:24 +0000
48.242.116.66
unrioting
2016-01-05 11:49:01 +0000
223.11.158.94
ribassano
Multiline Data
DATA
Matches any characters until the next non-optional matcher of pattern expression.
DATA must always be followed by a non-optional matcher expression.
output type
quantifier
configuration
string
default value: {1,4096}
- requires matching minimum 1 and maximum 4096 bytes
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
Example
Parsing stack trace records, laid over multiple lines and terminated by an empty line - i.e two consecutive line breaks:
2015.10.03 16:32:51.371 +0000 ERROR com.dt.webconsole.jsp.data.SQLTimeSeriesCache -- SQLTimeSeries remote fetch failedorg.postgresql.util.PSQLException: Connection refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:66)at org.postgresql.jdbc2.AbstractJdbc2Connection.<init>(AbstractJdbc2Connection.java:125)at org.postgresql.jdbc3.AbstractJdbc3Connection.<init>(AbstractJdbc3Connection.java:30)at org.postgresql.jdbc3g.AbstractJdbc3gConnection.<init>(AbstractJdbc3gConnection.java:22)at org.postgresql.jdbc4.AbstractJdbc4Connection.<init>(AbstractJdbc4Connection.java:30)at org.postgresql.jdbc4.Jdbc4Connection.<init>(Jdbc4Connection.java:24)at org.postgresql.Driver.makeConnection(Driver.java:393)at org.postgresql.Driver.connect(Driver.java:267)2015-10-03 19:33:47.422 +0000 WARN main com.dt.wgui.WGUIMain Log processing started in /Users/user/dt, listening http://localhost:8390/, pid: 11364@abcdef
DATA:record (EOL EOL)
Quoted Strings
SQS
Matches string enclosed between single quotes (ASCII 0x27). Any single quote inside the string must be escaped by backslash character "".
output type
quantifier
configuration
string
default value: {1,4096}
- requires matching minimum 1 and maximum 4096 bytes
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
Example
'Ay caramba!''Homer said: d\'oh!'
SQS:sq_string EOL
Result:
Ay caramba!
Homer said: d'oh!
Parsing the empty second line fails since SQS expects by default at least one character matching.
DQS
Matches string enclosed between double-quote characters (ASCII 0x22). Any double quote inside the string must be escaped by a backslash character (ASCII 0x5c).
output type
quantifier
configuration
string
default value: {1,4096}
- requires matching minimum 1 and maximum 4096 bytes
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
Example
"Red fox jumps over lazy dog""Red fox jumps over "lazy" dog"
DQS:dq_string EOL
Result:
Red fox jumps over lazy dog
Red fox jumps over ''lazy'' dog
CSVSQS
Matches string enclosed between single quotes (ASCII 0x27). Any single quote inside the string must be escaped by single quote character (CSV style).
output type
quantifier
configuration
string
default value: {1,4096}
- requires matching minimum 1 and maximum 4096 bytes
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
Example
'Red fox jumps over lazy dog''Red fox jumps over ''lazy'' dog'
CSVSQS:csvsq_string EOL
Result:
Red fox jumps over lazy dog
Red fox jumps over 'lazy' dog
CSVDQS
Matches string enclosed between double-quote characters (ASCII 0x22). Any double quote inside the string must be escaped by double-quote character (CSV style).
output type
quantifier
configuration
string
default value: {1,4096}
- requires matching minimum 1 and maximum 4096 bytes
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
Example
"Red fox jumps over lazy dog""Red fox jumps over ""lazy"" dog"
CSVSQS:csvsq_string EOL
Result:
Red fox jumps over lazy dog
Red fox jumps over ''lazy'' dog
Character Group
[ char ... ]
Matches a single character out of several in a defined group. Simply place the characters you want to match between square brackets.
Characters can also be expressed as ranges, for instance [0-9] matches any digit from 0 to 9. Negating is supported by placing a caret "^" or an exclamation mark "!" before characters.
In case you want to match a square bracket character, it must be escaped by a preceding backslash character(0x5c ASCII ).
Use a quantifier if you want to match more than single characters.
The syntax is compatible with Regular Expression Character Class.
Character group allows matching strings with specific characters (as opposed to LD or DATA which matches any characters).
output type
quantifier
configuration
string
default value: {1,1}
- requires matching minimum 1 and maximum 1 bytes
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
Example
Extracting username field, which is expected to consist of lowercase letters a to z and numbers, with minimum length of 4 characters and a maximum length of 15 characters:
2016-01-03 00:13:28,110.188.4.216,forerequest,2002016-01-06 06:35:24,48.242.116.66,02rioting,2002016-01-05 11:49:01,223.11.158.94,ribassano,404
TIMESTAMP:date_time ','IPADDR:ip ','[{*}0-9{*}a-z]{4,15}:username ','LD EOL;
where:
- extracts time and date, followed by field separator ','
- extracts ip-address, followed by field separator ','
- extracts username by matching characters a to z, min 4, max 10 times, followed by field separator ','
- matches but does not extract any character for the rest of the line
Results in all lines parsed into fields date_time, ip, username
.
2016-01-03 00:13:28 +0000
110.188.4.216
forerequest
2016-01-06 06:35:24 +0000
48.242.116.66
02rioting
2016-01-05 11:49:01 +0000
223.11.158.94
ribassano
POSIX Character Classes
Match one or more character corresponding to any of the characters in its defined group.
output type
quantifier
configuration
string
default value: {1,4096}
- requires matching minimum 1 and maximum 1 bytes
charset = character set name enclosed in single or double quotes (for example charset="ISO-8859-1"
)
locale = string specifying IETF BCP 47 language tag enclosed in single or double quotes (see the list here). The default locale is English.
You may use both matcher or POSIX notation.
ALNUM
POSIX notation: [:alnum:]
Matches alphanumeric characters a-z; A-Z; 0-9
ALPHA
[:alpha:]
Matches alphabetic characters a-z; A-Z
BLANK
[:blank:]
Matches space (0x20
) and tab (0x9
) characters
CNTRL
[:cntrl:]
Matches control characters in ASCII range: 0x1-0x1F; 0x7
DIGIT
[:digit:]
Matches digits in range of 0-9
GRAPH
[:graph:]
Matches visible characters in the ASCII code range
0x21 - 0x7E
LOWER
[:lower:]
Matches lowercase letters a-z
[:print:]
Matches printable characters in the ASCII code range
0x20 - 0x7E
PUNCT
[:punct:]
Matches punctuation and symbols
!"#$%&'()*+,\-./:;<=>?@[]^_`{|}~|
SPACE
[:space:]
Matches all whitespace characters. In ASCII codes:
0x20; 0x9; 0xA 0xB; 0xC; 0xD
NSPACE
[!:space:]
Matches all characters except whitespace.
UPPER
[:upper:]
Matches uppercase letters A-Z
XDIGIT
[:xdigit:]
Matches digits in hexadecimal notation 0x0 - 0xF
ASCII
[:ascii:]
Matches all ASCII characters in the range of 0x0 - 0x7F
WORD
[:word:]
Matches words: letters a-z; A-Z; numbers 0-9
and underscore
_
[:any:]
Matches any character in ASCII range 0x0 - 0xff
Example
Extracting username field, which is expected to consist of only lowercase letters a to z, with minimum length of 4 characters and a maximum length of 15 characters:
2016-01-03 00:13:28,110.188.4.216,forerequest,2002016-01-06 06:35:24,48.242.116.66,02rioting,2002016-01-05 11:49:01,223.11.158.94,ribassano,404
TIMESTAMP:date_time ','IPADDR:ip ','LOWER{4,15}:username ','LD EOL;
where:
- extracts time and date, followed by field separator ','
- extracts ip-address, followed by field separator ','
- extracts username by matching lowercase characters, min 4, max 10 times, followed by field separator ','
- matches but does not extract any character for the rest of the line
Results in first and third lines parsed into fields date_time, ip, username
. The second line fails to parse because of
username contains numbers.
2016-01-03 00:13:28 +0000
110.188.4.216
forerequest
1970-01-01 00:00:00.000 +0000
255.255.255.255
2016-01-05 11:49:01 +0000
223.11.158.94
ribassano