Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CONTRIBUTING.md] Add section about merging ECS conventions #333

Merged
merged 22 commits into from
Dec 15, 2023

Conversation

mx-psi
Copy link
Member

@mx-psi mx-psi commented Sep 20, 2023

Fixes #329

Changes

Add new section to CONTRIBUTING.md with expectations when merging existing ECS conventions.

Merge requirement checklist

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
Copy link
Member

@joaopgrassi joaopgrassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really love the "follow plural guidelines or other soft recommendation" wording we have there. I think we can do better :)

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
mx-psi and others added 2 commits October 13, 2023 14:51
Co-authored-by: Alexander Wert <AlexanderWert@users.noreply.github.com>
@trisch-me
Copy link
Contributor

trisch-me commented Oct 24, 2023

Hey @mx-psi I have finished my investigation on possible naming conflicts. Please check it below.

TL;DR:

  • pluralisation, i.e. ECS has fields which are arrays but key is singular. As prominent examples of it - *.ip and *.mac for arrays of ip or mac addresses, ~40 examples found in ECS - We do already have merged attributes with ip and mac. Should we say it's not a conflict anymore?
  • *.bytes fields which would be changed to the *.size ~15 examples
  • *.packets fields, I think they might be a subject to change as with previous field ~10 examples
  • @ character in name, there is only 1 such example (@timestamp), not sure if it against naming rules
  • array of nested objects: ECS contains fields (for example dns.answers) which are arrays containing objects. ECS defines fields of this nested objects. I'm not sure yet how this would go into Otel schema. From this data model can we say that for attribute of type map<string, any>, any could be such an object too?

Here is detailed list of Namespaces and possible conflicting fields in it:

  • Base: @timestamp
  • Client: client.bytes, client.packets
  • Container: container.disk.(write|read).bytes, container.image.hash.all (array of hashes), container.image.tag (array of image tags), container.network.(ingress|egress).bytes
  • Destination: destination.bytes, destination.packets
  • DNS: dns.resolved_ip (Array containing all IPs seen in answers.data)
  • Email: email.(bcc|cc|from|reply_to|to).address - The email addresses
  • Event: event.category, event.type - arrays
  • Host: host.disk.(write|read).bytes, host.ip (Host ip addresses), host.mac (Host MAC addresses, array of values), host.network.(ingress|egress).bytes, host.network.(ingress|egress).packets
  • Http: http.request.body.bytes, http.request.bytes (body + headers), http.response.body.bytes, http.response.bytes
  • Network: network.bytes, network.packets
  • Observer: observer.ip, observer.mac are arrays
  • Orchestrator: orchestrator.resource.annotation (The list of annotations added to the resource.), orchestrator.resource.ip, orchestrator.resource.label - these are all arrays
  • Process: process.thread.capabilities.effective (set of capabilities used by the kernel to perform permission checks for the thread), process.thread.capabilities.permitted - these are arrays and we have rule in metrics that namespaces shouldn't be pluralised, but I'm not sure about these cases
  • Related: related.hash, related.hosts, related.ip, related.user - all arrays with data gathered all over the event, I'm not sure if it will be moved to Otel but potential conflict. (A concrete example is IP addresses, which can be under host, observer, source, destination, client, server, and network.forwarded_ip. If you append all IPs to related.ip, you can then search for a given IP trivially, no matter where it appeared, by querying related.ip:192.0.2.15.)
  • Server: server.bytes, server.packets
  • Source: source.bytes, source.packets
  • Threat: threat.(group|software).alias, threat.tactic.(id|name|reference), threat.technique.(id|name|reference), threat.technique.subtechnique.(id|name|reference) - these are all arrays
  • User: user.id is defined not as array but a user might have more than 1 ID associated
  • Vulnerability: vulnerability.category is an array
  • x509 Certificate: x509.issuer.(common_name| country|locality|organization|organizational_unit|state_or_province), x509.subject.(common_name| country|locality|organization|organizational_unit|state_or_province) - these are all arrays

@pyohannes
Copy link
Contributor

pyohannes commented Oct 25, 2023

@ character in name, there is only 1 such example (@timestamp), not sure if it against naming rules

Can we possibly ignore this conflict, as we likely won't have a @timestamp field on semantic conventions (as that's part of the data structure of signals themselves, and doesn't need to be in semantic conventions)?

@trisch-me
Copy link
Contributor

  1. How many conventions from ECS don't follow our existing guidelines? So that we understand to how many misaligned conventions we are committing. Alexandra Conrad is working on this.

This one has been answered above. Please also note, this might be not full list of all possible conflicts, while adding new fields we might find other naming conflicts, but I think this list is comprehensive enough

  1. Is it harder to migrate users in some situations? For example, does host.ip being a leaf node in ECS and us introducing host.ip.addresses causes more trouble than a simple rename?

Yes, changing name is already a breaking change, but changing structure - in this case going from simple leaf with value/array of values to complex node with additional data underneath will have really strong negative impact on ECS users and tooling. I would strongly advise against it. Please also check comments in this PR for additional context

CONTRIBUTING.md Outdated Show resolved Hide resolved
@trisch-me
Copy link
Contributor

I have changed the status of the issue to get more attention from other approvers

Copy link
Member

@joaopgrassi joaopgrassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good now :)

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Show resolved Hide resolved
mx-psi and others added 2 commits December 11, 2023 17:17
Co-authored-by: Armin Ruech <7052238+arminru@users.noreply.github.com>
@mx-psi mx-psi requested a review from dmitryax December 11, 2023 16:18
@mx-psi
Copy link
Member Author

mx-psi commented Dec 12, 2023

Anything missing here? Or is this good to merge?

@AlexanderWert AlexanderWert merged commit b2e2c64 into open-telemetry:main Dec 15, 2023
9 checks passed
pyohannes pushed a commit to pyohannes/semantic-conventions that referenced this pull request Jan 17, 2024
…emetry#333)

Co-authored-by: Alexander Wert <AlexanderWert@users.noreply.github.com>
Co-authored-by: Joao Grassi <joao.grassi@dynatrace.com>
Co-authored-by: Armin Ruech <7052238+arminru@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Provide guidance when there is conflict with ECS conventions