-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add geo fields to add_host_metadata
processor.
#9392
Conversation
Pinging @elastic/uptime |
This comment has been minimized.
This comment has been minimized.
} | ||
|
||
func (ag addGeo) asMap() common.MapStr { | ||
return common.MapStr(ag).Clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: +1 on cloning, but from name only I would asMap expect to be a cast operation only. Caught me a little offguard when reading the argument for PutValue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll rename to mapClone()
.
CountryISOCode string `config:"country_iso_code"` | ||
RegionName string `config:"region_name"` | ||
CityName string `config:"city_name"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have validation for at least the fields that are stored as geo point in Elasticsearch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so we'd check the format to be a lat lon pair? I think that makes sense. Anything beyond that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, at least lat-lon pair.
The others are kind free form text and validating these might require a location database, right? We even store the others as keyword, so not that important. Validating them might be too much effort here (would be nice if we can do some simple validation - regex like -, but no hard requirement for me).
Most important is that indexing won't fail due to misconfiguration of the processor + maps in kibana dashboards will work correctly, cause lat lon pairs are correct and make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation added :)
|
||
ag := addGeo{ | ||
"name": config.Name, | ||
"location": config.Location, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we / can we do any validation for the location field to make sure we don't have ingestion errors?
return ag, nil | ||
} | ||
|
||
type Config struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit but I'm somehow used to have configs on top
type addGeo common.MapStr | ||
|
||
func (ag addGeo) Run(event *beat.Event) (*beat.Event, error) { | ||
_, err := event.PutValue("host.geo", ag.asMap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really belong into the host? We are interested where the agent is running so it would be agent.geo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting question. I agree that agent is slightly better, but I wonder if that is outweighed by the simplicity of not having yet another geo field. A location is tied to a host as much as it is tied to an agent.
There's an argument that's not true for the geo.name
field we use, which is free-form, but perhaps we don't need to be so dogmatic there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One point that would speak for the host is that in the case of apm-server, apm-server is not the agent but the observer. There it would then be observer.geo.*
. Having it under host
would then not change it. I'm good with going with host.*
for now and if someone is not happy with that, he can use the rename
processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for not having agent.geo.*
As @ruflin says, in ECS, the agent
is defined as running on a host
or an observer
.
Both host.geo.*
and observer.geo.*
are defined by ECS, so we are covered in either case.
After a discussion with @ruflin we decided to move this to the |
OK, redid the whole thing under the The only thing not yet done is the changes to fields.yml. Since this is now in ECS, is there a process for updated the ECS fields, or should I just manually add the ECS name field? |
b36db5e
to
c0587c7
Compare
add_host_metadata
processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but we should add a CHANGELOG entry.
|
||
if config.Geo != nil { | ||
if len(config.Geo.Location) > 0 { | ||
if m, _ := regexp.MatchString("^\\-?\\d+(\\.\\d+)?\\s*\\,\\s*\\-?\\d+(\\.\\d+)?$", config.Geo.Location); !m { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm undecided if I like this one or not. It's good that we check the user input but it looks like really complex regexp where I must confess I could not easily review to make sure it's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can break the regexp up into component strings to make ti more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, the regexp isn't 100% accurate (it doesn't check the numeric bounds), but it's a lot better than nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In b4feb7e I've made the regex more clear
@andrewvc Feel free to edit the ecs or the libbeat common file for the field, both works. |
36f8a04
to
16524a6
Compare
Looks like rebasing of master gives us the geo fields thanks to @andrewkroh :) in #9121 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except a missing CHANGELOG entry.
Ignore the failing mac build as I stopped them.
if len(config.Geo.Location) > 0 { | ||
// Regexp matching a number with an optional decimal component | ||
// Valid numbers: '123', '123.23', etc. | ||
latOrLon := `\-?\d+(\.\d+)?` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks for all the details here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, love it.
Perhaps one more test for empty values being purged correctly, but other than this, LGTM
} | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests.
Perhaps one more to test "Delete any empty values"? Make sure your empty values include things like empty spaces too {"city_name": " "}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webmat I've added handling + tests for blank strings. Mind taking a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good stuff.
Seems like I noticed the "din expect"
typo too late for this PR ;-)
This lets users add geo data if they have it to the host
16524a6
to
2e35691
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please ignore the failed test in Travis. Thought I fixed it but seems it still happens. Not related to this PR.
This lets users add geo data if they have it to the host (cherry picked from commit c6c4a30)
This lets users add geo data if they have it to the host (cherry picked from commit c6c4a30)
** EDIT ** I've left the original issue below the break, but after discussion we added geo fields to the
add_host_metadata
processor instead of a new one. Original is belowThis carries over from the discussion from #8620 .
This adds a new processor that lets users easily add geo fields associated with the host that created the event. You would use it like so:
It's debate-able whether ECS should actually let you put these under perhaps
agent.geo
. That's something we should discuss here.One other question here, should we just fold this functionality under
add_host_metadata
? I believe that probably makes more sense. We agreed in #8620 to make this a separate processor, but with the data nested under host, that makes less sense IMHO.