-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'expand_keys' option to JSON input/processor #22849
Conversation
Add an 'expand_keys' option to Filebeat's JSON input, and to the decode_json_fields processor. If true, the decoded JSON objects' keys will be recursively expanded, changing dotted keys into a hierarchical object structure. If there are two keys which expand to the same, then they must both be objects or an error will result, decoding will fail, and the existing error handling mechanisms will apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Steps errorsExpand to view the steps failures
|
Test | Results |
---|---|
Failed | 0 |
Passed | 17417 |
Skipped | 1379 |
Total | 18796 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, I like this approach much better than as a processor
// | ||
// Note that ExpandFields is descructive, and in the case of an error the | ||
// map may be left in a semi-expanded state. | ||
func ExpandFields(m common.MapStr) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible / make sense to make this a method on common.MapStr
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly is. I gathered from #20489 (comment) that @urso would prefer not to add to MapStr, but I can rearrange if preferred. Unless there's an expectation of reuse I typically avoid adding to common types/packages to avoid creating huge interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there's an expectation of reuse I typically avoid adding to common types/packages to avoid creating huge interfaces.
Agreed. The MapStr
interface is too big with redundant functionality at times. I'd rather have a small interface for Events in the future with a set of functions that operate on the public interface.
If ExpandFields
is not used somewhere else I would not export it (keep package interface smaller).
For consistency, if it is supposed to be used in other places, move it (as function) to the libbeat/common/mapstr.go. The libbeat/common
package is where MapStr and helpers for MapStr currently live in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it's not needed elsewhere (I originally thought it would be) - so I'll unexport it.
Pinging @elastic/integrations-services (Team:Services) |
Is it premature to add this? Is it a common enough use case that we should include it in every config file? |
IMO it would be nice to have this added and I don't see any backwards compatibility issues - in case of a conflict, with current behavior the event cannot not be ingested, with this change the json object itself still gets dropped but an error information gets ingested. This could be helpful to find logging issues. |
++ on having that in by default. I'd even suggest extending the dos a bit and mention that when using ECS loggers it's preferred to set this to true. |
To be clear I was just asking if we should mention the config in the config file, not turn it on by default. If we were to turn this feature on by default there would be a subtle backwards-compatibility issue: the document
I've added the config to |
There's an open issue for an overview documentation for ECS logging: elastic/ecs-logging#31. For the time being, we might just link to https://github.com/elastic/ecs-logging? |
Nice!
Good idea, I've added a sentence to the docs: "This setting should be enabled when the input is produced by an ECS logger." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown link --> Asciidoc link
Co-authored-by: Brandon Morelli <bmorelli25@gmail.com>
// | ||
// Note that ExpandFields is descructive, and in the case of an error the | ||
// map may be left in a semi-expanded state. | ||
func ExpandFields(m common.MapStr) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there's an expectation of reuse I typically avoid adding to common types/packages to avoid creating huge interfaces.
Agreed. The MapStr
interface is too big with redundant functionality at times. I'd rather have a small interface for Events in the future with a set of functions that operate on the public interface.
If ExpandFields
is not used somewhere else I would not export it (keep package interface smaller).
For consistency, if it is supposed to be used in other places, move it (as function) to the libbeat/common/mapstr.go. The libbeat/common
package is where MapStr and helpers for MapStr currently live in.
} else { | ||
oldMap, oldIsMap := getMap(old) | ||
if !oldIsMap { | ||
return fmt.Errorf("cannot expand %q: found conflicting key", k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to happen on type conflict only. I think we have similar cases in metricbeat. In that case we modify the key for old
to be <k>.value
. If the new object has a field named value
we can drop old
(because it is overwritten).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is effectively the same as #22849 (comment) - or is this something else?
The intended behaviour is to recursively merge objects, returning an error if there are two matching keys which either both have scalar values, or with one having a scalar value and one having an object value. This is intentionally strict for the first implementation; we could later either relax by default, or add options to relax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intentionally strict for the first implementation; we could later either relax by default, or add options to relax.
I'm ok if we follow up with this one later on.
Yeah, the two comments belong together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #23135 to track this.
logger := logp.NewLogger("jsonhelper") | ||
if expandKeys { | ||
if err := ExpandFields(keys); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we error here keys
is in an unknown state. Do we need to clone keys before this call in order to keep it intact on error? Logging the original document would be needed for users to understand why things did go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting the original document to show up under message
, like when a JSON decoding error occurs. That doesn't happen though. Is there any reason why we shouldn't do that, for a consistent debugging experience, instead of logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to respond to the first part:
if we error here keys is in an unknown state. Do we need to clone keys before this call in order to keep it intact on error?
As long as we include the original input (in message
), I don't see a need. I'm not intimately familiar with Filebeat though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON decoder in the log input does not store the original raw line in the message
field. One can configure a custom message
field (which is extracted from the json document, not the original line), but by default message
will not be set.
Anyways, the new fields are merged into the event after expansion via WriteJSONFields
. Neither the processor nor the json decoder in the log input reference to any fields in keys
(or keys
itself). This should make the operation safe. No need to clone.
Yes, we should mention it and keep it turned off by default. |
Co-authored-by: Brandon Morelli <bmorelli25@gmail.com> (cherry picked from commit 4f4a553)
What does this PR do?
Add an 'expand_keys' option to Filebeat's JSON input, and
to the decode_json_fields processor. If true, the decoded
JSON objects' keys will be recursively expanded, changing
dotted keys into a hierarchical object structure.
Objects will be recursively merged. In case of duplicate keys
at any level, the values must both be objects or an error will
result; decoding will fail, and the existing error handling
mechanisms will apply.
This is an alternative to #20489. The main differences are:
add_error_key
optionsfields are not added to the event. This prevents conflicts when indexing in Elasticsearch,
which will again try to expand the dotted fields and lead to a mapping conflict.
Why is it important?
See #17021
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Build filebeat, then run the following (first is valid, second is an example of a conflict.)
Related issues
Closes #17021
Replaces #20489