Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for include_fields and drop_fields #1120

Merged
merged 1 commit into from
Mar 21, 2016

Conversation

monicasarbu
Copy link
Contributor

This PR adds two configuration options: include_fields and drop_fields.

There is a list of mandatory fields like: @timestamp, type that cannot be removed and they are always available in the exported fields. With the two configuration options you can configure the additional fields that will be exported or not exported. By default, all fields are exported.

Here are the options:

  • include_fields=[] -> exports only the mandatory fields
  • include_fields is not defined (commented) -> export all fields (the default option)
  • drop_fields=[] -> export all fields (the default option)

As fields, they accept a map like proc or a value like proc.cpu.total_p.

In case there is a field to include that is not available in the event, then just ignore the field.

When multiple filtering rules are defined, we start with a copy of the event and apply the filtering rules one by to the
event.

   event -> filter rule 1 -> event 1 -> filter rule 2 -> event 2 ...

where event is the initial event, event1 is the event resulted after applying "filter rule1" and it's considered input for the "filter rule 2" and so on.


}

// deepEqual compars two MapStr assuming they only have primitive types as values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/compars/compares/

@pickypg
Copy link
Member

pickypg commented Mar 11, 2016

If you configure one of these fields in the drop_fields, it will be ignored.

Instead of ignoring it, can it just fail? That way if the list ever grows in the future, then a user that upgrades won't be surprised to find their rules no longer apply as they did before.

@monicasarbu
Copy link
Contributor Author

@pickypg The list of mandatory fields is defined in the code, so the user can configure only the extra fields that he wants to include or drop. I agree with you that the sentence is confusing, so I updated the description. Thanks for noticing.

@@ -84,6 +85,7 @@ type BeatConfig struct {
Output map[string]*ucfg.Config
Logging logp.Logging
Shipper publisher.ShipperConfig
Filter []filter.FilterConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we don't introduce a new top level config here as we are in the process to get rid of it. This leads to problems when we want for example run multiple beats as one binary. We should only have one common name space which is beat.

Alternatively this could also be added to every single beat config to have it "local". Means it would be filebeat.filter or packetbeat.filer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also do this change in a separate PR once it's merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we merge it like that we probably also going to keep it that way as also tests below and config changes for topbeat were already made for it. I suggest to currently add it under shipper which should be beat, same as we have the fields etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think shipper or beat (to be honest I think beat is too generic, maybe host?) should be the section that includes only the configuration options for that host like geoip configuration, max_queue, tags and shouldn't include fields and filter as these are not a configuration of the host and more of an enhancement of the output data.
I suggest to leave it like this for now and change it in a separate PR if we decide t o structure differently the configuration file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1165 will allow to place it under beat.filter as we do we fields and tags.

@monicasarbu monicasarbu force-pushed the add_generic_filtering branch 2 times, most recently from cf8d908 to c022ec2 Compare March 16, 2016 11:49
assert "http.response_headers" in objs[0]

# generic filtering fails to delete the htp.response_headers.transfer-encoding
assert "transfer-encoding" not in objs[0]["http.response_headers"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the comment above this statement. Does it really fail or it's more like "check if filtering fails to delete.."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly

@tsg
Copy link
Contributor

tsg commented Mar 16, 2016

I think mandatory should really only be @timestamp and type because we rely on them in a lot of places. Everything else should be deletable.

@ruflin
Copy link
Member

ruflin commented Mar 16, 2016

@tsg Agree with that these should be the required libbeat fields. We could add a possibility in the future that beats can also reserve some fields as required.

@tsg
Copy link
Contributor

tsg commented Mar 16, 2016

LGTM apart of the comments above. Nice work!

Longer term we should work towards an Event type that's not MapStr{} so that all this can be safer and faster.

@monicasarbu monicasarbu force-pushed the add_generic_filtering branch 2 times, most recently from 769408a to bb0fd33 Compare March 17, 2016 12:55
@monicasarbu monicasarbu changed the title Add support for include_fields and drop_fields Add support for drop_all_fields_except and drop_fields Mar 17, 2016
@monicasarbu
Copy link
Contributor Author

All the comments were addressed. It's ready for review.

@monicasarbu
Copy link
Contributor Author

I changed the action from include_fields to drop_all_fields_except to be more easier to understand.

@ruflin
Copy link
Member

ruflin commented Mar 17, 2016

@elastic/beats Any other thoughts on the naming of include_fields? #1120 (comment)

@monicasarbu
Copy link
Contributor Author

I renamed drop_all_fields_except with include_fields. Now it's ready for the final review.

@monicasarbu
Copy link
Contributor Author

Rebased to master, it should be green soon.

@monicasarbu monicasarbu changed the title Add support for drop_all_fields_except and drop_fields Add support for include_fields and drop_fields Mar 17, 2016
@monicasarbu monicasarbu force-pushed the add_generic_filtering branch 3 times, most recently from d747b3e to 2d78255 Compare March 18, 2016 14:09
@monicasarbu
Copy link
Contributor Author

Finally all green now

include_fields - removes all the fields from the event except the configured fields
drop_fields - removes only the configured fields

Both actions receive fields as argument that can be of type map or a value. For example: "proc", "proc.cpu.total_p".
In case a map is passed as an argument, all the fields inside the map are considered.
Each event MUST contain the fields ["@timestamp","type"] and they cannot be removed from the event.

When multiple filtering rules are defined, we start with a copy of the event and apply the filtering rules one by to the
event.
   event -> filter rule 1 -> event 1 -> filter rule 2 -> event 2 ...
   where event is the initial event, event1 is the event resulted after applying "filter rule1" and it's considered
   input for the
   "filter rule 2" and so on.

Problem encountered:

The generic filtering expects to receive an event with known types, meaning that it contains MapStr and other
primitive types and not structures. In case the event contains an unknown structure for libbeat (it's defined
inside the Beat), then we do a marshal & unmarshal to conver it to a MapStr.

Configuration:

If include_fields=None or not defined, then all the fields are exported
If include_fields=[], then only the mandatory fields are exported
If drop_fields=[], then all the fields are exported

The default values are:
include_fields=None
drop_fields=[]
tsg added a commit that referenced this pull request Mar 21, 2016
Add support for include_fields and drop_fields
@tsg tsg merged commit 06c3132 into elastic:master Mar 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants