Second proposal for JSON support #1143

tsg · 2016-03-12T00:32:18Z

I tried another option for #1069. The main change is that JSON processing now happens before multiline, so the order is:

Encoding decoding
JSON decoding
Multiline
Line/file filtering
Add custom fields
Generic filtering

The main advantage of this over #1069 is that it supports uses cases like Docker where normal log lines are wrapped in JSON. It should also work fine for most of the structured logging use cases.

Here is a sample config:

      json:
        message_key: log
        keys_under_root: true
        overwrite_keys: true

The idea is that when configuring the JSON decoder, you can select a "message" key that will be used in the next stages (multiline and line filtering). If you don't choose a "message" key but still try to configure line filtering or multiline, you will get a configuration error.

Compared to the #1069, this is more complex and contains a bit more corner cases (e.g. what happens if the text key is not a string) but the code is still simple enough I think.

This still requires the JSON objects to be one per line, but I think that's the safer assumption to make anyway (see comment from #1069).

tsg · 2016-03-12T00:33:22Z

This is in PoC phase, so don't merge it yet, but I'd like your feedback on it, @elastic/beats.

ruflin · 2016-03-14T09:41:37Z

As far as I understand, this is the more powerful option of #1069. It has the same features but more. If no text_key is defined, will it behave like #1069?

ruflin · 2016-03-14T09:42:27Z

filebeat/etc/beat.yml

@@ -140,6 +140,10 @@ filebeat:
      # file is skipped, as the reading starts at the end. We recommend to leave this option on false
      # but lower the ignore_older value to release files faster.
      #force_close_files: false
+      json_decoder:


Not sure if we perhaps should just call it just json instead of json_decoder. It is shorter and will not get us into the discussion of adding further "decoders" :-)

tsg · 2016-03-16T13:33:58Z

Yes, it's more powerful and not a lot more complex. For sure even more powerful options can be imagined, but those would move us to much in the direction of "generic processing". Then, if I don't hear any objections, I'll move ahead to add tests and docs to this PR.

andrewkroh · 2016-03-18T02:25:13Z

Nice code, it's very readable, easy to follow, and has documentation. 😄 I think this approach will serve us well for most use cases.

Some of the methods and variables could be changed (i.e. Json becomes JSON) to conform to golint naming.

tsg · 2016-03-18T14:10:35Z

This should be ready for reviews now. I want to squash before merging, so let me know when it looks good.

ruflin · 2016-03-21T07:23:19Z

There seems to be an error in the OS build: https://travis-ci.org/elastic/beats/jobs/116909985#L1527

. /Users/travis/gopath/src/github.com/elastic/beats/filebeat/build/python-env/bin/activate; nosetests -w tests/system --process-timeout=90 --with-timer
.....................E........................
======================================================================
ERROR: Should be able to interpret docker logs.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/travis/gopath/src/github.com/elastic/beats/filebeat/tests/system/test_json.py", line 28, in test_docker_logs
    max_timeout=10)
  File "../../../libbeat/tests/system/beat/beat.py", line 277, in wait_until
    "Waited {} seconds.".format(max_timeout))
Exception: Timeout waiting for 'cond' to be true. Waited 10 seconds.

ruflin · 2016-03-21T07:24:37Z

filebeat/config/config.go

+	TextKey       string `config:"text_key"`
+	KeysUnderRoot bool   `config:"keys_under_root"`
+	OverwriteKeys bool   `config:"overwrite_keys"`
+	AddErrorKey   bool   `config:"add_error_key"`


Not sure if ew should shorten the config and just call it add_error, overwrite.

ruflin · 2016-03-21T07:34:04Z

LGTM. I added some late thought about the config naming (sorry for not brining that up earlier), but we can move this also to a later stage. Please also update the CHANGELOG file.

Should we add a flag to the event when it was json decoded? Similar to what was requested for multiline?

tsg · 2016-03-21T10:51:55Z

I think the test failure was due to a miss-placed ignore_older setting. I addressed the comments and squashed the whole thing into 1 commit. Lets wait for green.

ruflin · 2016-03-21T11:27:35Z

LGTM. Waiting for green.

urso · 2016-03-21T12:53:39Z

Can we add some more JSON multiline tests?

kinda looks like multiline is still done before merging. Here the reader pipeline is configured. I can find json decoding only after having read the file.

victorarbuesmallada · 2016-03-21T14:17:07Z

any news about this being merge to master?

tsg · 2016-03-22T10:19:19Z

@Painyjames: @urso found a pretty major flow, in that this doesn't combine with multiline the way I was expecting it to. I'm looking for a solution now, I still expect this to be merged in master this week or the next.

urso · 2016-03-22T12:03:07Z

filebeat/harvester/processor/multiline.go

 	return retLine
 }

 func (mlr *MultiLine) pushLine() Line {
 	content := mlr.content
 	sz := mlr.readBytes
+	fields := mlr.fields


when merging multiple json events, which fields to we want to report? What if first one contains a timestamp?

What if in 'addLine' the next line adds some fields not seen in fist one?

For simplicity I was thinking that all fields besides the message_key are taken from the first event. This should be good enough for uses cases similar to the docker one. I should probably put this somewhere in the docs somewhere.

urso · 2016-03-22T12:14:53Z

LGTM.

Limitation right now is 1 json object per line, but with interface changes we're very flexible to enhance reading/parsing in future.

JSON decoding happens before multiline, so the order of processing is: * Encoding decoding * JSON decoding * Multiline * Line/file filtering * Add custom fields * Generic filtering Here is a sample config: ``` json: message_key: log keys_under_root: true overwrite_keys: true ``` The idea is that when configuring the JSON decoder, you can select a "message" key that will be used in the next stages (multiline and line filtering). If you don't choose a "message" key but still try to configure line filtering or multiline, you will get a configuration error.

tsg · 2016-03-22T14:49:42Z

Moved the json decoding part in a processor, so the issue reference above is solved. We now also have a system test for JSON + multiline. I rebased already, so this is ready to be reviewed / merged if green.

Second proposal for JSON support

devinrsmith · 2016-03-22T20:28:40Z

Are there any proposals for multiline json support?

I see in #1069 there are some comments about it.

IMO a new input_type is the best course of action.

I think one of the primary use cases for logs are that they are human readable. The first thing I usually do when an issue arrises is to open up a console and scroll through the log(s). Filebeats provides multiline support, but it's got to be configured on a log by log basis.

Using pretty printed JSON objects as log "lines" is nice because they are human readable.

Limiting the input to single line JSON objects limits the human usefulness of the log.

For example, here is a real-ish log line that I just grabbed:

{
    "primaryType": "ACTION",
    "diagnosticType": "com.example.server.endpoints.MyEndpoint",
    "requestTimestamp": "2016-03-22T20:18:25.281Z",
    "path": "actions/FD0IjHbzKoAkCz_NHr9bB___/messages",
    "method": "POST",
    "queryParams": {},
    "requestHeaders": {
        "Accept": [
            "application/json"
        ],
        "X-Forwarded-Proto": [
            "https",
            "https"
        ],
        "User-Agent": [
            "MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"
        ],
        "Host": [
            "v3-test.example.com",
            "v3-test.example.com"
        ],
        "Accept-Language": [
            "en-CA;q=1"
        ],
        "Content-Length": [
            "17"
        ],
        "Content-Type": [
            "application/json; charset=UTF-8"
        ]
    },
    "userId": "FDxnF4enX8EV1mIxwujCSv__",
    "profileId": "FDxnF4ezX8DV1mIxwujCS___",
    "actions": [],
    "responseTimestamp": "2016-03-22T20:18:25.287Z",
    "status": 204,
    "responseHeaders": {}
}

vs

{"primaryType":"ACTION","diagnosticType":"com.example.server.endpoints.MyEndpoint","requestTimestamp":"2016-03-22T20:18:25.281Z","path":"actions/FD0IjHbzKoAkCz_NHr9bB___/messages","method":"POST","queryParams":{},"requestHeaders":{"Accept":["application/json"],"X-Forwarded-Proto":["https","https"],"User-Agent":["MyApp Debug/15 (iPhone; iOS 9.2.1; Scale/2.00)"],"Host":["v3-test.example.com","v3-test.example.com"],"Accept-Language":["en-CA;q=1"],"Content-Length":["17"],"Content-Type":["application/json; charset=UTF-8"]},"userId":"FDxnF4enX8EV1mIxwujCSv__","profileId":"FDxnF4ezX8DV1mIxwujCS___","actions":[],"responseTimestamp":"2016-03-22T20:18:25.287Z","status": 204,"responseHeaders":{}}

The pretty printed JSON is much more human readable than the single line format :)

I understand it might be out of scope for this pull request, but I'm hoping filebeats can eventually support it.

devinrsmith · 2016-03-22T20:35:04Z

Created a new issue since I see this request has been merged :)

asldevi · 2016-05-03T07:34:56Z

any idea on when this is going to be released ?

ruflin · 2016-05-03T14:56:40Z

This is already released as part of the 5.0.0-alpha1 release: https://www.elastic.co/downloads/beats/filebeat

asldevi · 2016-05-04T04:51:06Z

thank you so much for the info, @ruflin

tsg added the Filebeat Filebeat label Mar 12, 2016

ruflin reviewed Mar 14, 2016
View reviewed changes

tsg mentioned this pull request Mar 17, 2016

Proposal: JSON support in Filebeat #1069

Closed

tsg force-pushed the json_support_take_two branch from f02867b to 59b5910 Compare March 17, 2016 10:44

tsg added the review label Mar 18, 2016

ruflin reviewed Mar 21, 2016
View reviewed changes

tsg force-pushed the json_support_take_two branch 2 times, most recently from 4932b15 to abf4ef0 Compare March 21, 2016 10:50

chrono mentioned this pull request Mar 22, 2016

Filebeat throws Unexpected file stat rror: stat /dev/stdin: bad file descriptor #1029

Closed

urso reviewed Mar 22, 2016
View reviewed changes

tsg force-pushed the json_support_take_two branch from 4472ac4 to ceb25bd Compare March 22, 2016 14:46

monicasarbu added a commit that referenced this pull request Mar 22, 2016

Merge pull request #1143 from tsg/json_support_take_two

6a66cc6

Second proposal for JSON support

monicasarbu merged commit 6a66cc6 into elastic:master Mar 22, 2016

devinrsmith mentioned this pull request Mar 22, 2016

Multiline JSON filebeat support #1208

Closed

Bargs mentioned this pull request Mar 29, 2016

[Pipeline] Better display of source objects without a message field elastic/kibana#6618

Closed

tsg deleted the json_support_take_two branch August 25, 2016 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second proposal for JSON support #1143

Second proposal for JSON support #1143

tsg commented Mar 12, 2016

tsg commented Mar 12, 2016

ruflin commented Mar 14, 2016

ruflin Mar 14, 2016

tsg commented Mar 16, 2016

andrewkroh commented Mar 18, 2016

tsg commented Mar 18, 2016

ruflin commented Mar 21, 2016

ruflin Mar 21, 2016

ruflin commented Mar 21, 2016

tsg commented Mar 21, 2016

ruflin commented Mar 21, 2016

urso commented Mar 21, 2016

victorarbuesmallada commented Mar 21, 2016

tsg commented Mar 22, 2016

urso Mar 22, 2016

urso Mar 22, 2016

tsg Mar 22, 2016

urso commented Mar 22, 2016

tsg commented Mar 22, 2016

devinrsmith commented Mar 22, 2016

devinrsmith commented Mar 22, 2016

asldevi commented May 3, 2016

ruflin commented May 3, 2016

asldevi commented May 4, 2016

Second proposal for JSON support #1143

Second proposal for JSON support #1143

Conversation

tsg commented Mar 12, 2016

tsg commented Mar 12, 2016

ruflin commented Mar 14, 2016

ruflin Mar 14, 2016

Choose a reason for hiding this comment

tsg commented Mar 16, 2016

andrewkroh commented Mar 18, 2016

tsg commented Mar 18, 2016

ruflin commented Mar 21, 2016

ruflin Mar 21, 2016

Choose a reason for hiding this comment

ruflin commented Mar 21, 2016

tsg commented Mar 21, 2016

ruflin commented Mar 21, 2016

urso commented Mar 21, 2016

victorarbuesmallada commented Mar 21, 2016

tsg commented Mar 22, 2016

urso Mar 22, 2016

Choose a reason for hiding this comment

urso Mar 22, 2016

Choose a reason for hiding this comment

tsg Mar 22, 2016

Choose a reason for hiding this comment

urso commented Mar 22, 2016

tsg commented Mar 22, 2016

devinrsmith commented Mar 22, 2016

devinrsmith commented Mar 22, 2016

asldevi commented May 3, 2016

ruflin commented May 3, 2016

asldevi commented May 4, 2016