-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second proposal for JSON support #1143
Conversation
This is in PoC phase, so don't merge it yet, but I'd like your feedback on it, @elastic/beats. |
@@ -140,6 +140,10 @@ filebeat: | |||
# file is skipped, as the reading starts at the end. We recommend to leave this option on false | |||
# but lower the ignore_older value to release files faster. | |||
#force_close_files: false | |||
json_decoder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we perhaps should just call it just json
instead of json_decoder. It is shorter and will not get us into the discussion of adding further "decoders" :-)
Yes, it's more powerful and not a lot more complex. For sure even more powerful options can be imagined, but those would move us to much in the direction of "generic processing". Then, if I don't hear any objections, I'll move ahead to add tests and docs to this PR. |
f02867b
to
59b5910
Compare
Nice code, it's very readable, easy to follow, and has documentation. 😄 I think this approach will serve us well for most use cases. Some of the methods and variables could be changed (i.e. Json becomes JSON) to conform to golint naming. |
This should be ready for reviews now. I want to squash before merging, so let me know when it looks good. |
There seems to be an error in the OS build: https://travis-ci.org/elastic/beats/jobs/116909985#L1527
|
TextKey string `config:"text_key"` | ||
KeysUnderRoot bool `config:"keys_under_root"` | ||
OverwriteKeys bool `config:"overwrite_keys"` | ||
AddErrorKey bool `config:"add_error_key"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if ew should shorten the config and just call it add_error
, overwrite
.
LGTM. I added some late thought about the config naming (sorry for not brining that up earlier), but we can move this also to a later stage. Please also update the CHANGELOG file. Should we add a flag to the event when it was json decoded? Similar to what was requested for multiline? |
4932b15
to
abf4ef0
Compare
I think the test failure was due to a miss-placed |
LGTM. Waiting for green. |
Can we add some more JSON multiline tests? kinda looks like multiline is still done before merging. Here the reader pipeline is configured. I can find json decoding only after having read the file. |
any news about this being merge to master? |
@Painyjames: @urso found a pretty major flow, in that this doesn't combine with multiline the way I was expecting it to. I'm looking for a solution now, I still expect this to be merged in master this week or the next. |
return retLine | ||
} | ||
|
||
func (mlr *MultiLine) pushLine() Line { | ||
content := mlr.content | ||
sz := mlr.readBytes | ||
fields := mlr.fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when merging multiple json events, which fields to we want to report? What if first one contains a timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if in 'addLine' the next line adds some fields not seen in fist one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity I was thinking that all fields besides the message_key
are taken from the first event. This should be good enough for uses cases similar to the docker one. I should probably put this somewhere in the docs somewhere.
LGTM. Limitation right now is 1 json object per line, but with interface changes we're very flexible to enhance reading/parsing in future. |
JSON decoding happens before multiline, so the order of processing is: * Encoding decoding * JSON decoding * Multiline * Line/file filtering * Add custom fields * Generic filtering Here is a sample config: ``` json: message_key: log keys_under_root: true overwrite_keys: true ``` The idea is that when configuring the JSON decoder, you can select a "message" key that will be used in the next stages (multiline and line filtering). If you don't choose a "message" key but still try to configure line filtering or multiline, you will get a configuration error.
4472ac4
to
ceb25bd
Compare
Moved the json decoding part in a processor, so the issue reference above is solved. We now also have a system test for JSON + multiline. I rebased already, so this is ready to be reviewed / merged if green. |
Second proposal for JSON support
Are there any proposals for multiline json support? I see in #1069 there are some comments about it. IMO a new I think one of the primary use cases for logs are that they are human readable. The first thing I usually do when an issue arrises is to open up a console and scroll through the log(s). Filebeats provides Using pretty printed JSON objects as log "lines" is nice because they are human readable. Limiting the input to single line JSON objects limits the human usefulness of the log. For example, here is a real-ish log line that I just grabbed:
vs
The pretty printed JSON is much more human readable than the single line format :) I understand it might be out of scope for this pull request, but I'm hoping filebeats can eventually support it. |
Created a new issue since I see this request has been merged :) |
any idea on when this is going to be released ? |
This is already released as part of the 5.0.0-alpha1 release: https://www.elastic.co/downloads/beats/filebeat |
thank you so much for the info, @ruflin |
I tried another option for #1069. The main change is that JSON processing now happens before multiline, so the order is:
The main advantage of this over #1069 is that it supports uses cases like Docker where normal log lines are wrapped in JSON. It should also work fine for most of the structured logging use cases.
Here is a sample config:
The idea is that when configuring the JSON decoder, you can select a "message" key that will be used in the next stages (multiline and line filtering). If you don't choose a "message" key but still try to configure line filtering or multiline, you will get a configuration error.
Compared to the #1069, this is more complex and contains a bit more corner cases (e.g. what happens if the text key is not a string) but the code is still simple enough I think.
This still requires the JSON objects to be one per line, but I think that's the safer assumption to make anyway (see comment from #1069).