-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Continue target process RFC to stage 1 #1297
Conversation
I'll see if I can add some |
<!-- | ||
Stage 1: Identify potential concerns, implementation challenges, or complexity. Spend some time on this. Play devil's advocate. Try to identify the sort of non-obvious challenges that tend to surface later. The goal here is to surface risks early, allow everyone the time to work through them, and ultimately document resolution for posterity's sake. | ||
--> | ||
|
||
The biggest concern is the duplication of fields and the double-nested `process` group at `process.target.parent`. This could require some updates to our reuse mechanism, but that's an issue internal to this repository. We should make sure that we don't accidentally populate `process.parent.target`, which would have different meaning. Because of this, we will need to make sure that we articulate what each reuse means, similar to https://www.elastic.co/guide/en/ecs/current/ecs-user.html#ecs-user-nestings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for calling this out.
Changes will need to be made to support reusing process.*
fields as parent.*
underneath process.target
since self-nestings (fields reused within themselves) aren't carried around to other places (avoids nestings like user.target.*
from getting carried over to source.user.target.*
)
Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
Co-authored-by: Eric Beahan <eric.beahan@elastic.co>
@andrewstucki any updates on your end? |
@rw-access sorry, getting the auditbeat examples fell off my radar, I'll try and generate them in the next few days. |
Hey @andrewstucki just another nudge. |
@rw-access I added some more details for the stage 1 criteria into the PR description. In addition to the mapping examples, any thoughts about who could act as a sponsor? |
For sponsor, @magermark is probably the best fit, then @devonakerr. I'm also fine being the sponsor for this one. |
Hey @ebeahan, just talked to @andrewstucki. Sounds like he's busy and won't be able to get the examples. Do you see those as a blocker for the RFC? Should we move forward without additional examples for what this would look like on linux, or ping another person who can get them? |
Thanks for the update, @rw-access! Let's move forward with the current sysmon example. I think it does a good job of capturing the high-level intent with these proposed fields. If you've finalized who will sponsor, can you add them as the sponsor under the I'll do a final review from the ECS team as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
When @devonakerr reviews and approves, I'll set the date and merge the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve, thanks for getting those adjustments in, Ross.
Sorry to come in at this late stage with negative feedback - I hadn't been aware of the nesting aspects that may be implied by the current approach, specifically after after having seen: #1459 I'm concerned by the precedent of replicating a field structure nested within a field structure for multiple reasons: proliferation of fields in ElasticsearchThe following JSON generates 8 fields which in reality might be more consistently expressed as four, not an issue at this scale, but if we are talking e.g. 1000 fields per document that may require 2000 fields which in turn could become a problem for scaling Elasticsearch downstream
this might be better expressed as e.g. 3 docs
I appreciate this means that a search for the full history of the parentage of Proliferation of fields at multiple levelsAside from the impact on Elasticsearch, having multiple fields with the same name at multiple levels may create issues for e.g. anyone wishing to (or already) unpack(-ing) the structure using e.g. X-Path expressions Impact of nesting on downstream appsRequiring downstream apps to processes nested fields raises multiple issues: -Obligation to handle nested structures -Risk of Stack Overflow and System Outage -Risk of Stack Overflow and Data Compromise AlternativesO11y currently handles the linkage of a transaction history by adding a unique ID + 1 parent ID to transactions derived from a common parent, which allows them to grouped by that ID & or unpacked to a tree structure based on the parent IDs |
Hey @djptek, thanks for the feedback. I don't totally understand what you mean. A lot of these problems aren't new;
Is that still a problem if it's 10 fields? 20 fields? 50 fields? Just trying to understand when it's a problem and when it's not.
I'm unsure what role introducing
We're not extending the mapping infinitely though. We're repeating the same thing we did for And just in case it needs saying, we're not using any I think there might be some mutual confusion over this RFC. Do you think it would make sense for us to communicate synchronously, so we don't accidentally talk past each other? Some of the things you mention don't seem totally relevant from my understanding, but I don't want to disregard your concerns just because I don't understand them. |
Hi @rw-access
if this is simply a pointer from one
It Depends... would be the official answer see also index.mapping.total_fields.limit
So the goal isn't to nest each child, child-of-child, child-of-child-of-child, etc... within the parent, but that the child has two pointers, one each to predecessor and follower? That would put us back in linked list territory, which feels a lot more OK
Thanks for clarifying - don't worry, I was clear on that one :-) |
I'm still not sure if we're talking about the same thing. {
"process": {
"name": "cmd.exe",
"pid": 1848,
"command_line": "cmd /c whoami",
"parent": {
"name": "explorer.exe",
"pid": 1240
}
}
} here's the proposal: {
"process": {
"name": "cmd.exe",
"pid": 1848,
"command_line": "cmd /c whoami",
"parent": {
"name": "explorer.exe",
"pid": 1240
},
"target": {
"name": "lsass.exe",
"command_line": "C:\\Windows\\System32\\lsass.exe",
"pid": 604
}
}
} and if {
"process": {
"name": "cmd.exe",
"pid": 1848,
"command_line": "cmd /c whoami",
"parent": {
"name": "explorer.exe",
"pid": 1240
},
"target": {
"name": "lsass.exe",
"command_line": "C:\\Windows\\System32\\lsass.exe",
"pid": 604,
"parent": {
"name": "wininit.exe",
"pid": 132
}
}
}
} |
@rw-access Thanks for the clarification, I'm nearly there, though 'fraid I still have a couple more questions:
Thanks for doing the detail with me on this |
The RFC only proposes an additional nesting of the process.* fields at process.target and process.target.parent, assuming we don't run into problems with that (i.e. too many new fields added) There would only be one target process in a cross-process event. Please take another look at the RFC for the examples |
LGTM - thanks for covering these extra questions at this late stage @rw-access |
This is great. A much needed capability for further standardization of process fields. I look forward to the SysMon use cases as well as EDR solutions when it comes to cross process activity. Thank you! |
Continuing #1286 to stage 1
Markdown preview of the proposal
Stage 1 Criteria