Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_host_metadata processor overwrites host.id field added by modules #20464

Closed
kaiyan-sheng opened this issue Aug 6, 2020 · 10 comments
Closed
Assignees
Labels
Team:Platforms Label for the Integrations - Platforms team

Comments

@kaiyan-sheng
Copy link
Contributor

We identified several common fields for host inventory schema and these fields include host.id and host.name. add_host_metadata does not overwrite host.name field with #14407 but not host.id.

Things to consider:

  1. make add_host_metadata to do nothing if host.* fields already exist.
  2. add tests to make sure host.* fields don't get overwritten by add_host_metadata processor.
  3. Maybe we should use add_observer_metadata processor instead?

cc @jsoriano

@kaiyan-sheng kaiyan-sheng added the Team:Platforms Label for the Integrations - Platforms team label Aug 6, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@jsoriano
Copy link
Member

jsoriano commented Aug 6, 2020

@kaiyan-sheng thanks for opening this issue.

Maybe we should use add_observer_metadata processor instead?

In any case I think that we could make add_host_metadata to do nothing if there is already host information in the event. (Same thing for add_observer_metadata if there is observer information in the event).

@andrewkroh
Copy link
Member

  1. make add_host_metadata to do nothing if host.* fields already exist.

I think this is the right behavior to have because you want all or nothing. Otherwise you can have data representing different hosts mixed together. Same for add_observer_metadata.

One challenge is that host.name is added by libbeat before the global add_host_metadata processor is executed. So if you check for the existence of host then it will always be present. I think we should not add host.name in libbeat and always rely on add_host_metadata when the data is needed (and the module should make that decision, rather than always including the processor globally). So perhaps this can be addressed for 8.0.

// setup 6: add beats and host metadata
if meta := builtin; len(meta) > 0 {
processors.add(actions.NewAddFields(meta, needsCopy, false))
}
// setup 8: pipeline processors list
processors.add(b.processors)

@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Aug 10, 2020

@jsoriano @andrewkroh Agreed. How about seperating this issue into two steps/PRs?

  1. Change add_host_metadata to not overwrite if host fields already exist (besides host.name):
    PR: Add replace_fields config option in add_host_metadata for replacing host fields #20490
  2. Remove adding host.name from libbeat and add it into add_host_metadata(in 8.0): [Libbeat] Move adding host.name into add_host_metadata #21773

@jsoriano
Copy link
Member

How about seperating this issue into two steps/PRs?

Sounds good to me. I think we can create another issue for the removal of host.name. We may still want to add it after all the processing is done if nothing else added it (I would expect the one added by libbeat was added this way). Some UIs rely on host.name field, and there can be cases of people not using add_host_metadata and using modules that don't provide host information.

@kaiyan-sheng
Copy link
Contributor Author

PR for adding replace_fields config option is merged.

In order to match what add_host_metadata processor's current behavior, replace_fields is set to true by default right now. BUT in order to show the lastest host inventory schema fields, replace_fields needs to be set to false. 
I would like to set the default to false in Metricbeat at some point.

@andrewkroh @exekias @jsoriano Is there any concerns here for making this breaking change in 7.10? 

@exekias
Copy link
Contributor

exekias commented Aug 25, 2020

I'm +1 to make this breaking change in 7.10. To my understanding it should not break default setups, and perhaps only some rare cases where people manually add host fields before add_host_metadata kicks in?

@jsoriano
Copy link
Member

I'm +1 to make this breaking change in 7.10. To my understanding it should not break default setups, and perhaps only some rare cases where people manually add host fields before add_host_metadata kicks in?

I don't think it will break anything important on Metricbeat. But I wonder if this can break some security use case where Filebeat parses the hostname from a log file, stores it in host.name, and add_host_metadata is used to add network information.

@jsoriano
Copy link
Member

I'm +1 to make this breaking change in 7.10. To my understanding it should not break default setups, and perhaps only some rare cases where people manually add host fields before add_host_metadata kicks in?

I don't think it will break anything important on Metricbeat. But I wonder if this can break some security use case where Filebeat parses the hostname from a log file, stores it in host.name, and add_host_metadata is used to add network information.

Replying to myself :D maybe an option to mitigate this is to keep old behaviour if the value in host.name is equal to the value that add_host_metadata would add.
This way it will add the fields when add_host_metadata is used to add network information of current host, and it will do nothing if host.name is not the current host. replace_fields could still be used to force old behaviour in any case.

@kaiyan-sheng
Copy link
Contributor Author

Closing this issue with two separate followup issues created:

  1. Move host.name to add_host_metadata: [Libbeat] Move adding host.name into add_host_metadata #21773
  2. Make breaking change in 7.11 for replace_fields: false: Change default value replace_fields to false in add_host_metadata processor #21774

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Platforms Label for the Integrations - Platforms team
Projects
None yet
Development

No branches or pull requests

5 participants