Dissect splits a string into its parts. A dissect implementation compares a string against a pattern and then splits the string based on the pattern rules. This specification defines the expected behavior for dissect implementations.
- pattern (or tokenizer) - the pattern used to define how to split a string. For example
"%{timestamp} %{+timestamp} %{+timestamp} %{logsource} %{program}[%{pid}]: %{message}
. - key - the part of the string to match, identified by
%{key}
. - delimiter - the part of the string to NOT match.
- modifier - An special instruction found inside the dissect key to change the behavior.
- parser - the software that implements this specification to split the string.
- pattern:
%{a} %{b},%{c}
- string:
foo bar,baz
- result:
a=foo, b=bar, c=baz
This pattern has 3 keys: a
, b
, and c
, and two delimiters
(space) and ,
(comma). When the parser is run against the string with the given pattern, the result is a set of key/value pairs. The parser searches for the when the delimiter in the pattern matches the delimiter in the string.
In this example, search the string for
(space), the first delimiter. Found a space in the string, so assign the a key equal to everything up until the space (but not including). So a=foo. The next delimiter is ,
(comma) search for comma in the string, found it, so assign b=bar. No more delimiters, so assign c to the remainder of the string (baz).
- Pattern Specification
- A dissect pattern must contain at least one key
- A dissect pattern may have leading and trailing and delimiters
- A dissect pattern may have multiple delimiters of different characters of different lengths.
- A dissect patten must contain unique key names unless the modifier allows or requires duplicated key names.
- A dissect pattern may not use
%
as delimiters
- Key specification
- A dissect key must start with
%{
and end with}
- A dissect key may have a name, e.g.
%{key_name}
and it must be able to be encoded as UTF-8. - A dissect key may have an empty name e.g.
%{}
, this is called a skip key and must not be included in the final results. - A dissect key name may not have any of the modifiers characters as part of the name.
- A dissect key may have modifiers to the left or the right, or left and right of the key name.
- A dissect key must start with
- Modifier specification
- A dissect modifier must be defined inside the dissect key, to the left or right the key name.
- Multiple dissect modifiers per key may be allowed.
->
: Right padding ignore - instructs the parser to ignore repeating consecutive repeating delimiters to the right of the key. The->
modifier must be placed to the right of the key name and is allowed to co-exist with any other modifiers and must always be the furthest right modifier. see example below+
Append - instructs the parser to append this key's value to the value to the prior key (left to right) with the same name. A user defined append separator must be supported. The user defined separator is a character, or set of characters that will be placed between the appended values. The+
modifier must be placed to the left of the key name. see example below+
and/n
Append with order - instructs the parser to append this key's to the value of the prior key with the same name based on order. The+
modifier must be placed on the left of the key name and/n
modifier placed to the right of the key name, where n = order. The order must start at1
. see example below?
- Named skip key instructs the parser to not include this result in the final result set. Behaves identical to an empty skip key%{}
but may be used to help with human readability. The?
modifier must be placed to the left of the key name. see example below*
and&
reference modifiers. This modifier requires two keys with the same name present in the dissect pattern. One key with the*
and another with the&
. This instructs the parser that the value discovered by the*
is to be used as the key name for the value discovered by the corresponding&
key. These modifiers must be placed on the left of the key name. see example below
- Parser specification
- A dissect parser must not allow partial matches. All delimiters must be present in string, and all keys must have a corresponding value.
- A dissect parser must support an empty key
%{}
(skip key) as valid match, but not include the result in the result set. - A dissect parser must be able to parse any string that can be encoded as UTF-8
- A dissect parser must match the leading and trailing delimiters if present in the dissect pattern.
- A dissect parser must allow the last key of a pattern to match the remainder of the string without additional modifiers. see example below
- A dissect parser must treat consecutive repeating delimiters as valid empty matches unless instructed otherwise by modifiers. see example below
- A dissect parser must allow a user specified string to use as the value between append operations. see example below
- A dissect parser must support multiple character delimiters.
- A dissect parser result set must be string/string key value pairs.
- A dissect parser must support all modifiers defined by they specification.
- pattern:
%{a->} %{b} %{c}
- string:
foo bar baz
- result:
a=foo, b=bar, c=baz
In the above example, the delimiter is
(space), the ->
instructs the parser to skip all of the consecutive repeating
to the right of a
- pattern:
%{a->},%{b},%{c}
- string:
foo,,,,bar,baz
- result:
a=foo, b=bar, c=baz
In the above example, the delimiter is ,
(comma) and the ->
instructs the parser to skip all of the consecutive repeating ,
to the right of a
Multi-character delimiters must be supported.
- pattern:
%{a->},:%{b},%{c}
- string:
foo,:,:,:,:bar,baz
- result:
a=foo, b=bar, c=baz
Empty skip key with right padding must be supported.
- pattern:
%{->},%{b},%{c}
- string:
foo,,,,bar,baz
- result:
b=bar, c=baz
- pattern:
%{a} %{+a} %{+a}
- string:
foo bar baz
- result:
a=foobarbaz
In the above example the, the values are append in left to right order to the result.
A user specified append separator must be supported. Assume the user define the separator to be ,
(comma space)
- pattern:
%{a} %{+a} %{+a}
- string:
foo bar baz
- result:
a=foo, bar, baz
- pattern:
%{a} %{+a/2} %{+a/1}
- string:
foo bar baz
- result:
a=foobazbar
In the above example the values are appended together based on the order specified.
- pattern:
%{a} %{?skipme} %{c}
- string:
foo bar baz
- result:
a=foo, c=baz
In the above example, the parser finds the matches correctly, but excludes the middle key from the results. This is the same behavior as %{}
, and the name is only used for human readability.
- pattern:
%{*a} %{b} %{&a}
- string:
foo bar baz
- result:
foo=baz, b=bar
In the above example, there is a pair of a
keys. One has the *
and the other &
. This instructs the parser to use the value of the *
as the key name for the value of &
in the result set. *
and &
must come in pairs in the dissect pattern.
The left / right order of *
and &
does not matter.
- pattern:
%{&a} %{b} %{*a}
- string:
foo bar baz
- result:
baz=foo, b=bar
- pattern:
%{a} %{b},%{c}
- string:
foo bar,baz something more here
- result:
a=foo, b=bar, c=baz something more here
In the above example the last key matched the remainder of the input string.
- pattern:
%{a},%{b},%{c},%{d},%{e},%{f},%{g}
- string:
foo,,,,,,bar
- result:
a=foo, b="", c ="", d="", e="", f="", g=bar
In the above example the ,
repeats many times, leaving 5 empty key/value pairs.
- pattern:
%{a},%{b},%{c},%{d},%{e},%{f},%{g}
- string:
foo,,bar,,,,baz
- result:
a=foo, b="", c ="bar", d="", e="", f="", g=baz
In the above example the ,
repeats many times, finds a value, then repeats more.
- pattern:
%{a->},%{g}
- string:
foo,,,,,,bar
- result:
a=foo, g=bar
In the above example the ,
repeats many times, but the right padding modifier ->
instructs the parser to skip over the repeating delimiters.
- pattern:
%{timestamp} %{+timestamp} %{+timestamp} %{logsource} %{program}[%{pid}]: %{message}
- string:
Mar 16 00:01:25 example postfix/smtpd[1713]: connect from example.com[192.100.1.3]
- result:
timestamp="Mar 16 00:01:25" , logsource="example", program="postix/smtpd" pid="1713" message="connect from example.com[192.100.1.3]"