Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second proposal for JSON support #1143

Merged
merged 1 commit into from
Mar 22, 2016

Conversation

tsg
Copy link
Contributor

@tsg tsg commented Mar 12, 2016

I tried another option for #1069. The main change is that JSON processing now happens before multiline, so the order is:

  • Encoding decoding
  • JSON decoding
  • Multiline
  • Line/file filtering
  • Add custom fields
  • Generic filtering

The main advantage of this over #1069 is that it supports uses cases like Docker where normal log lines are wrapped in JSON. It should also work fine for most of the structured logging use cases.

Here is a sample config:

      json:
        message_key: log
        keys_under_root: true
        overwrite_keys: true

The idea is that when configuring the JSON decoder, you can select a "message" key that will be used in the next stages (multiline and line filtering). If you don't choose a "message" key but still try to configure line filtering or multiline, you will get a configuration error.

Compared to the #1069, this is more complex and contains a bit more corner cases (e.g. what happens if the text key is not a string) but the code is still simple enough I think.

This still requires the JSON objects to be one per line, but I think that's the safer assumption to make anyway (see comment from #1069).

@tsg tsg added the Filebeat Filebeat label Mar 12, 2016
@tsg
Copy link
Contributor Author

tsg commented Mar 12, 2016

This is in PoC phase, so don't merge it yet, but I'd like your feedback on it, @elastic/beats.

@ruflin
Copy link
Member

ruflin commented Mar 14, 2016

As far as I understand, this is the more powerful option of #1069. It has the same features but more. If no text_key is defined, will it behave like #1069?

@@ -140,6 +140,10 @@ filebeat:
# file is skipped, as the reading starts at the end. We recommend to leave this option on false
# but lower the ignore_older value to release files faster.
#force_close_files: false
json_decoder:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we perhaps should just call it just json instead of json_decoder. It is shorter and will not get us into the discussion of adding further "decoders" :-)

@tsg
Copy link
Contributor Author

tsg commented Mar 16, 2016

Yes, it's more powerful and not a lot more complex. For sure even more powerful options can be imagined, but those would move us to much in the direction of "generic processing". Then, if I don't hear any objections, I'll move ahead to add tests and docs to this PR.

@andrewkroh
Copy link
Member

Nice code, it's very readable, easy to follow, and has documentation. 😄 I think this approach will serve us well for most use cases.

Some of the methods and variables could be changed (i.e. Json becomes JSON) to conform to golint naming.

@tsg tsg added the review label Mar 18, 2016
@tsg
Copy link
Contributor Author

tsg commented Mar 18, 2016

This should be ready for reviews now. I want to squash before merging, so let me know when it looks good.

@ruflin
Copy link
Member

ruflin commented Mar 21, 2016

There seems to be an error in the OS build: https://travis-ci.org/elastic/beats/jobs/116909985#L1527

. /Users/travis/gopath/src/github.com/elastic/beats/filebeat/build/python-env/bin/activate; nosetests -w tests/system --process-timeout=90 --with-timer
.....................E........................
======================================================================
ERROR: Should be able to interpret docker logs.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/test_json.py", line 28, in test_docker_logs
    max_timeout=10)
  File "../../../libbeat/tests/system/beat/beat.py", line 277, in wait_until
    "Waited {} seconds.".format(max_timeout))
Exception: Timeout waiting for 'cond' to be true. Waited 10 seconds.

TextKey string `config:"text_key"`
KeysUnderRoot bool `config:"keys_under_root"`
OverwriteKeys bool `config:"overwrite_keys"`
AddErrorKey bool `config:"add_error_key"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if ew should shorten the config and just call it add_error, overwrite.

@ruflin
Copy link
Member

ruflin commented Mar 21, 2016

LGTM. I added some late thought about the config naming (sorry for not brining that up earlier), but we can move this also to a later stage. Please also update the CHANGELOG file.

Should we add a flag to the event when it was json decoded? Similar to what was requested for multiline?

@tsg tsg force-pushed the json_support_take_two branch 2 times, most recently from 4932b15 to abf4ef0 Compare March 21, 2016 10:50
@tsg
Copy link
Contributor Author

tsg commented Mar 21, 2016

I think the test failure was due to a miss-placed ignore_older setting. I addressed the comments and squashed the whole thing into 1 commit. Lets wait for green.

@ruflin
Copy link
Member

ruflin commented Mar 21, 2016

LGTM. Waiting for green.

@urso
Copy link

urso commented Mar 21, 2016

Can we add some more JSON multiline tests?

kinda looks like multiline is still done before merging. Here the reader pipeline is configured. I can find json decoding only after having read the file.

@victorarbuesmallada
Copy link

any news about this being merge to master?

@tsg
Copy link
Contributor Author

tsg commented Mar 22, 2016

@Painyjames: @urso found a pretty major flow, in that this doesn't combine with multiline the way I was expecting it to. I'm looking for a solution now, I still expect this to be merged in master this week or the next.

return retLine
}

func (mlr *MultiLine) pushLine() Line {
content := mlr.content
sz := mlr.readBytes
fields := mlr.fields
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when merging multiple json events, which fields to we want to report? What if first one contains a timestamp?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if in 'addLine' the next line adds some fields not seen in fist one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity I was thinking that all fields besides the message_key are taken from the first event. This should be good enough for uses cases similar to the docker one. I should probably put this somewhere in the docs somewhere.

@urso
Copy link

urso commented Mar 22, 2016

LGTM.

Limitation right now is 1 json object per line, but with interface changes we're very flexible to enhance reading/parsing in future.

JSON decoding happens before multiline, so the order of processing
is:

* Encoding decoding
* JSON decoding
* Multiline
* Line/file filtering
* Add custom fields
* Generic filtering

Here is a sample config:
```
      json:
        message_key: log
        keys_under_root: true
        overwrite_keys: true
```

The idea is that when configuring the JSON decoder, you can select a "message"
key that will be used in the next stages (multiline and line filtering). If you
don't choose a "message" key but still try to configure line filtering or
multiline, you will get a configuration error.
@tsg
Copy link
Contributor Author

tsg commented Mar 22, 2016

Moved the json decoding part in a processor, so the issue reference above is solved. We now also have a system test for JSON + multiline. I rebased already, so this is ready to be reviewed / merged if green.

monicasarbu added a commit that referenced this pull request Mar 22, 2016
Second proposal for JSON support
@monicasarbu monicasarbu merged commit 6a66cc6 into elastic:master Mar 22, 2016
@devinrsmith
Copy link

Are there any proposals for multiline json support?

I see in #1069 there are some comments about it.

IMO a new input_type is the best course of action.

I think one of the primary use cases for logs are that they are human readable. The first thing I usually do when an issue arrises is to open up a console and scroll through the log(s). Filebeats provides multiline support, but it's got to be configured on a log by log basis.

Using pretty printed JSON objects as log "lines" is nice because they are human readable.

Limiting the input to single line JSON objects limits the human usefulness of the log.

For example, here is a real-ish log line that I just grabbed:

{
    "primaryType": "ACTION",
    "diagnosticType": "com.example.server.endpoints.MyEndpoint",
    "requestTimestamp": "2016-03-22T20:18:25.281Z",
    "path": "actions/FD0IjHbzKoAkCz_NHr9bB___/messages",
    "method": "POST",
    "queryParams": {},
    "requestHeaders": {
        "Accept": [
            "application/json"
        ],
        "X-Forwarded-Proto": [
            "https",
            "https"
        ],
        "User-Agent": [
            "MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"
        ],
        "Host": [
            "v3-test.example.com",
            "v3-test.example.com"
        ],
        "Accept-Language": [
            "en-CA;q=1"
        ],
        "Content-Length": [
            "17"
        ],
        "Content-Type": [
            "application/json; charset=UTF-8"
        ]
    },
    "userId": "FDxnF4enX8EV1mIxwujCSv__",
    "profileId": "FDxnF4ezX8DV1mIxwujCS___",
    "actions": [],
    "responseTimestamp": "2016-03-22T20:18:25.287Z",
    "status": 204,
    "responseHeaders": {}
}

vs

{"primaryType":"ACTION","diagnosticType":"com.example.server.endpoints.MyEndpoint","requestTimestamp":"2016-03-22T20:18:25.281Z","path":"actions/FD0IjHbzKoAkCz_NHr9bB___/messages","method":"POST","queryParams":{},"requestHeaders":{"Accept":["application/json"],"X-Forwarded-Proto":["https","https"],"User-Agent":["MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"],"Host":["v3-test.example.com","v3-test.example.com"],"Accept-Language":["en-CA;q=1"],"Content-Length":["17"],"Content-Type":["application/json; charset=UTF-8"]},"userId":"FDxnF4enX8EV1mIxwujCSv__","profileId":"FDxnF4ezX8DV1mIxwujCS___","actions":[],"responseTimestamp":"2016-03-22T20:18:25.287Z","status": 204,"responseHeaders":{}}

The pretty printed JSON is much more human readable than the single line format :)

I understand it might be out of scope for this pull request, but I'm hoping filebeats can eventually support it.

@devinrsmith
Copy link

Created a new issue since I see this request has been merged :)

@asldevi
Copy link

asldevi commented May 3, 2016

any idea on when this is going to be released ?

@ruflin
Copy link
Member

ruflin commented May 3, 2016

This is already released as part of the 5.0.0-alpha1 release: https://www.elastic.co/downloads/beats/filebeat

@asldevi
Copy link

asldevi commented May 4, 2016

thank you so much for the info, @ruflin

@tsg tsg deleted the json_support_take_two branch August 25, 2016 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants