Allow dissect to be type aware #10

vjsamuel · 2020-04-15T18:12:09Z

Grok processor currently allows the parsing to be type aware as seen here:
https://www.elastic.co/guide/en/elasticsearch/reference/master/grok-processor.html#grok-basics

It would save the dissect processor some cycles on using convert processor to imply a non string type on the data.

The text was updated successfully, but these errors were encountered:

ph · 2020-04-16T19:57:04Z

I was looking at the syntax of grok, we are using something like %{syntax:field:value}, when we look at the dissect syntax we often use a special char to define the meaning and separe the key from the actual behavior.

# : for string limited
/3 : for reordering of items.

So looking at this I think we would like to find something that has some meaning maybe we could use > as an analogy we cat into something which could be a type.

A few examples:

%{key>int} Key is a number
%{v1>bool} Key is either true, false, TRUE, FALSE, 1:0, f, t
%{v2>float} Key is a float

I think we should keep a limited list of types, I think grok have a good list of type: int, long, double, float and boolean.

cc @jsvd @jakelandis @andresrc

vjsamuel · 2020-04-16T20:00:56Z

thanks @ph. can we enhance the proposal to accommodate timestamp parsing as well? we have had a lot of customers ask us to be able to efficiently parse timestamps out

ph · 2020-04-16T20:10:22Z

Date or timestamp is a bigger problem but I think we could make it work with patterns maybe something like:

%{key>date} this would use a predefined list of format and we could expand it and allow to define parsing hints that could look something like %{key>date([dd/MM/yyyy hh:mm:s, dd/MM/yyyy])}, where the first matching is fine.

The main problem I see is how we define the format because we have multiples implementations: Java/Go and this will be more complex to converge.

Maybe the date is an exception and it should be a separate processor if you look at Grok it doesn't consider the date as a parsable primitive and you are encourage to delegate the work to a specialized processors: Date filter in Logstash and the Date processor in Elasticsearch.

jsvd · 2020-04-20T10:39:56Z

So looking at this I think we would like to find something that has some meaning maybe we could use > as an analogy we cat into something which could be a type.

Dissect relies on -> to discard padding, so maybe | could be used instead as a "pipe-to-convert" metaphor. So in order to parse and convert this line format:

1   123.12    true  :size=32
999 12.1      false :weight=55

We'd use:

"%{id->|int} %{value->|float} %{buy->|float}:%{?a}=%{&a|float}"

Which would produce:

{ "id": 1, "value": 123.12, "buy": true, "size": 32.0 }

and

{ "id": 999, "value": 12.1, "buy": false, "weight": 55.0 }

jsvd · 2020-04-20T10:41:29Z

Also, for reference, the logstash dissect implementation has a convert_datatype directive that is called after the mapping is processed.

ph · 2020-04-20T12:05:54Z

I like the ->| idea, we didn't implement the convert_data option on our end because we did have a processor to handle that. :)

jsvd · 2020-04-20T13:00:19Z

Just to be clear, | would be the operator, ->| is just an example of how the operator is combined with other existing ones.

This reminds me that we also need to define where this operator can be placed within the expression (usually at the end) and which operators can it be combined with (e.g. maybe don’t allow combining with +?)

ph · 2020-04-20T14:49:24Z

Oh, I've misread that so ->| would mean discard padding than pipe to specified type.

For the position, your suggestion to have it at the end of the expression is a good thing.

Correct I think we should disallow it for + and /n because we would need to support more than just string append.

and /n Append with order - instructs the parser to append this key's to the value of the prior key with the same name based on order. The + modifier must be placed on the left of the key name and /n modifier placed to the right of the key name, where n = order. The order must start at 1. see example below

I haven't looked at this but maybe + /n and \ should be considered unique per expression, maybe we could call this family of symbol: "modifier"

jsvd · 2020-04-20T15:19:17Z

we didn't implement the convert_data option on our end because we did have a processor to handle that. :)

And we also should consider that it could be the right call to make here. Implementing conversion brings many edge cases, which some we've handled in the mutate filter's convert (still has issues). Users already get hit by this disconnect when using logstash's dissect filter, example here. Even elasticsearch's convert processor has its own implementation details and feature requests.

Adding this capability to dissect may cause some expectation that all of these have parity between them, especially for use cases like this one, where a user is already relying on the convert processor and moves to the not-exactly-the-same dissect's convert feature.

ph · 2020-04-20T15:22:44Z

@jsvd maybe, this indeed brings a new set of challenges, what I am worried about is their interaction with templates, a coercing problem could indeed lead to an problem to indexing.

@jsvd By curiosity have you ever considered removing the convert from dissect?

jsvd · 2020-04-20T15:30:32Z

@jsvd By curiosity have you ever considered removing the convert from dissect?

I haven't thought about it. it's been there since day 0, and we're not sure how many folks depend on it. we're spending some time thinking about telemetry and that could help us understand if this feature is used or if it should be deprecated.

vjsamuel · 2020-04-30T19:33:24Z

@ph @jsvd is there reservations in doing this or is this something we can take up and submit PRs?

jsvd · 2020-05-07T15:08:04Z

On the Logstash side it's unlikely we'll be implementing this in a near future, as logstash already provides the ability to do conversion within the dissect filter and on this side there hasn't been a demand for this either due to performance or reduced configuration size.

That said I don't want to block you from contributing this feature. If you agree with any of the syntax suggestions here - or have another - and have created a go implementation that demonstrates it working, we can review the PRs:

me/PH and Jake for the dissect-specification PR
someone on the Beats side for the beats processor change

Regarding feature parity across implementations, maybe we can start versioning this spec. For example, current version can be 1.0.0, and this new feature would land in 1.1.0, since it's a new capability that doesn't break previous ones.

ph · 2020-05-07T15:35:33Z

I agree with @jsvd proposal.

vjsamuel · 2020-05-07T19:34:24Z

@ph @jsvd we are ok with the use of the specification that you have recommended and will go by the same.

sandhyatallapanneni · 2021-11-16T18:37:05Z

Hi There,

I am writing pattern in logstash version 7.10.2 or 7.15.2 for my log having different lines of pattern using Dissect filter with its key Modifier-> to remove space or spaces as per the log line before the next delimiter. But the Modifier-> is not working. Any suggestions? Please help

My log line is like below:

2020-07-03 14:49:02,003 INFO [org.jboss.as.ee] (MSC service thread 1-2) Configuring component class
2021-06-09 02:21:42,303 DEBUG [org.jboss.as.ee] (ServerService Thread Pool -- 56) Activating EE subsystem

My filter pattern with Dissect is like below:

filter {
        dissect {
mapping => { "message" => "%{logtime} %{+logtime} %{loglevel->} [%{modulename}] (%{thread}) %{logmessage}" }
}
    date {
                match => ["logtime", "ISO8601", "YYYY-MM-dd HH:mm:ss,SSS"]
    }
    mutate {
remove_field => ["path", "message"]
     }
}

I have two spaces after log level INFO and 1 space after log level DEBUG. So trying to use -> Modifier
Using grok I can get it working but want to know why is the Dissect Key Modifier -> failing and saying pattern not found.

Please help

Thanks!

ph added the discuss label Apr 16, 2020

premendrasingh mentioned this issue May 21, 2020

LIBBEAT: Enhancement Convert dissected values from String to other basic data types and IP elastic/beats#18683

Merged

6 tasks

sayden closed this as completed in elastic/beats#18683 Jul 13, 2020

sayden mentioned this issue Jul 13, 2020

Cherry-pick #18683 to 7.x: LIBBEAT: Enhancement Convert dissected values from String to other basic data types and IP elastic/beats#19838

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow dissect to be type aware #10

Allow dissect to be type aware #10

vjsamuel commented Apr 15, 2020

ph commented Apr 16, 2020 •

edited

Loading

vjsamuel commented Apr 16, 2020

ph commented Apr 16, 2020

jsvd commented Apr 20, 2020

jsvd commented Apr 20, 2020

ph commented Apr 20, 2020

jsvd commented Apr 20, 2020

ph commented Apr 20, 2020

jsvd commented Apr 20, 2020

ph commented Apr 20, 2020

jsvd commented Apr 20, 2020

vjsamuel commented Apr 30, 2020

jsvd commented May 7, 2020

ph commented May 7, 2020

vjsamuel commented May 7, 2020

sandhyatallapanneni commented Nov 16, 2021

Allow dissect to be type aware #10

Allow dissect to be type aware #10

Comments

vjsamuel commented Apr 15, 2020

ph commented Apr 16, 2020 • edited Loading

vjsamuel commented Apr 16, 2020

ph commented Apr 16, 2020

jsvd commented Apr 20, 2020

jsvd commented Apr 20, 2020

ph commented Apr 20, 2020

jsvd commented Apr 20, 2020

ph commented Apr 20, 2020

jsvd commented Apr 20, 2020

ph commented Apr 20, 2020

jsvd commented Apr 20, 2020

vjsamuel commented Apr 30, 2020

jsvd commented May 7, 2020

ph commented May 7, 2020

vjsamuel commented May 7, 2020

sandhyatallapanneni commented Nov 16, 2021

ph commented Apr 16, 2020 •

edited

Loading