Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes: How to populate resource attributes based on attributes, labels and transformation #1756

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

zeitlinger
Copy link
Member

@zeitlinger zeitlinger commented Jan 17, 2025

Fixes #236

This is a new take on the issue. #349 has been created before and not completed.

So what's the difference now?

  1. The proposed rules are already implemented in the OTel operator and work well: https://github.com/open-telemetry/opentelemetry-operator#configure-resource-attributes - the PR (at least in it's initial form) makes the operator implementation fully compatible with this spec PR.
  2. The specification work on service.instance.id has been completed - which was effectively a blocker before.

Merge requirement checklist

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the overall direction. This makes a lot of sense

docs/resource/k8s.md Outdated Show resolved Hide resolved
docs/resource/k8s.md Outdated Show resolved Hide resolved
docs/resource/k8s.md Outdated Show resolved Hide resolved
@ChrsMark ChrsMark requested a review from a team January 20, 2025 10:59
docs/resource/k8s.md Outdated Show resolved Hide resolved
@dmitryax
Copy link
Member

We briefly discussed this during the k8s SemConv WG meeting. There is no consensus on the document yet, but we decided to recommend moving it from the k8s resource section of the semantic conventions repo to service because it prescribes a way of populating values of service attributes, not attributes in k8s namespace.

docs/resource/k8s.md Outdated Show resolved Hide resolved
- `pod.annotation[resource.opentelemetry.io/service.version]`
- `pod.label[app.kubernetes.io/version]` (well-known label
[app.kubernetes.io/version](https://kubernetes.io/docs/reference/labels-annotations-taints/#app-kubernetes-io-version))
- `if (contains(container docker image tag, '/') == false) container docker image tag`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we bail if the tag has /? The service attribute conventions don't define a format for version. If none of the other options work, wouldn't using the tag be better (not sure what the sdk defaults to)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about this part of the spec.

I took this from the way the operator logic - collector does it a bit differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found the reason - it was a guard for a port number separator as can be seen here: https://github.com/distribution/reference/blob/284a39eaf3368476e0a4c6114a0eec61220acdd9/reference_test.go#L250

I've updated the spec to refer to that library - which is already used by the collector for container.image.id -

... with the notable difference that I think we should include the digest if available, it's the best source of truth for the service version

Choose the first value found:

- `pod.annotation[resource.opentelemetry.io/service.instance.id]`
- `concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed service.instance.id in the semconv meeting and had some doubts about the scenario where a container is restarted without a pod recreate (e.g. kubelet restarting a container for OOM). The confusion was whether this id should change to reflect that a new instance of the container was started which is what would happen i think if we left this to an sdk.

Copy link
Contributor

@jinja2 jinja2 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention for service.instance.id states that it MUST be unique for each instance of the same service.namespace,service.name pair (in other words service.namespace,service.name,service.instance.id triplet MUST be globally unique). For a pod which is owned by a statefulset this is not the case. I guess the question is whether this id should be used to differentiate between different instances of a service at a given time or across all instances ever run. In case of a statefulset, the id is not unique enough to differentiate between instances of a pod with same sts ordinal but k8s will only run one instance of a pod with a given ordinal (the id will remain same after an sts pod is recreated). If we want a different id across pod recreates, switching to pod uid instead of name would work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generally interpreted "globally unique" here to mean unique at a given time, rather than unique across all time. IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.

Copy link
Contributor

@jinja2 jinja2 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.

Are sdks going to ignore any service.instance.id values passed in with the env var OTEL_RESOURCE_ATTRIBUTES? My understanding is that the documented values are currently being calculated and set in this var by operator when auto-instrumenting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is somewhat moot since SDKs are going to start generating unique IDs for service.instance.id anyways.

Are sdks going to ignore any service.instance.id values passed in with the env var OTEL_RESOURCE_ATTRIBUTES? My understanding is that the documented values are currently being calculated and set in this var by operator when auto-instrumenting.

no, SDKs must honor whatever attributes are set using OTEL_RESOURCE_ATTRIBUTES

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generally interpreted "globally unique" here to mean unique at a given time, rather than unique across all time.

that is also my understanding

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jsuereth for the feedback on this question, which I have now included in the PR itself:

Note that the container restart count is not included in the service.instance.id calculation, because it makes
troubleshooting harder when the ID changes on every restart, e.g. in a crash loop.

@zeitlinger
Copy link
Member Author

We briefly discussed this during the k8s SemConv WG meeting. There is no consensus on the document yet, but we decided to recommend moving it from the k8s resource section of the semantic conventions repo to service because it prescribes a way of populating values of service attributes, not attributes in k8s namespace.

@open-telemetry/specs-semconv-approvers the service page is automatically generated - how can I move the content of this PR to the service page?

@github-actions github-actions bot added the enhancement New feature or request label Jan 28, 2025
@trask
Copy link
Member

trask commented Jan 28, 2025

@open-telemetry/specs-semconv-approvers the service page is automatically generated - how can I move the content of this PR to the service page?

maybe as subsection(s) under https://github.com/open-telemetry/semantic-conventions/blob/main/docs/resource/README.md#service

docs/resource/README.md Outdated Show resolved Hide resolved
@zeitlinger zeitlinger force-pushed the k8s-label-recommendations branch from 604cdc9 to d850c85 Compare February 3, 2025 13:53
@zeitlinger
Copy link
Member Author

@jsuereth @tigrannajaryan thanks for the positive feedback in the entities SIG call. A added the notes about the object hierarchy for service.name and value stability for service.instance.id. Please let me know if this addresses everything.

@zeitlinger
Copy link
Member Author

@jinja2 thanks for the detailed review - everything should be addressed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:k8s enhancement New feature or request
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

Proposal: Define mapping from k8s well-known labels to semconv
8 participants