Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add namespace k8s tagger #3384

Merged

Conversation

chaitanyaphalak
Copy link
Contributor

@chaitanyaphalak chaitanyaphalak commented May 13, 2021

Description: Add feature to fetch namespace metadata, labels and annotations

Link to tracking Issue: Feature request - #3430 and use-case - #3031

Testing:

Documentation:

@chaitanyaphalak chaitanyaphalak force-pushed the add-namespace-k8s-tagger branch 2 times, most recently from 1e313a9 to 514fb9a Compare May 19, 2021 16:25
@chaitanyaphalak chaitanyaphalak force-pushed the add-namespace-k8s-tagger branch from 2e0f035 to 7c8b447 Compare May 19, 2021 18:39
@chaitanyaphalak chaitanyaphalak marked this pull request as ready for review May 19, 2021 18:56
@chaitanyaphalak chaitanyaphalak requested a review from a team May 19, 2021 18:56
@owais
Copy link
Contributor

owais commented May 19, 2021

I think we should re-think the config changes this introduces. Instead of a new extract_namespaces section, we can extend the existing extract section. Some ideas,

  1. Use path like keys under extract to tell the exporter to extract from something other than the pod.
    extract:
      namespace/annotations:
      - key: splunk.com/index
      - key: splunk.com/sourcetype
      - key: splunk.com/exclude

This works well if we always want to support extacting data from objects that are related to a pod. In the example above, w'd extract the annotations from the namespace that the pod is in. In future we could have deployment/*, daemonset/*, service/* or any other type that could be related to the pod which generated the span.

  1. Add a from: attribute to every item to specify where to extract the item from.
    extract:
      annotations:
      - key: splunk.com/index
        from: namespace
      - key: splunk.com/sourcetype
        from: namespace
      - key: splunk.com/exclude # gets extracted from pod by default

the new from attribute can default to pod and as a result needs to be mentioned only when extracting from something other than a pod.

I feel this allows for a much nicer config pattern that can be extended in future as we add support for newer types. Open to other ideas as well.

@chaitanyaphalak
Copy link
Contributor Author

@owais
I think the first option feels easy to use from a config and backward compatability point of view.

Should we follow the same pattern for metadata and labels?

extract:
  namespace/labels:
    - tag_name: l1 # extracts value of label with key `label1` and inserts it as a tag with key `l1`
       key: label1      
namespace/metadata:
    # extract the following well-known metadata fields
    - namespaceName
    - namespaceUID
    - namespaceCluster
    - namespaceStartTime

@owais
Copy link
Contributor

owais commented May 19, 2021

Should we follow the same pattern for metadata and labels?

Yes. I used annotations just as an example. We should support all feature we can/want for every k8s resources we add support for.

I'd wait to see what others think about these config options before going ahead with the implementation. Tagging some people who might be interested in this.

@dmitryax @pmm-sumo @ericmustin @tigrannajaryan @pmatyjasek-sumo @dashpole

@dmitryax
Copy link
Member

I like both options that Owais suggested, the option 2 looks probably a bit better to me.

Having a default value pod for the additional from key looks more natural than assuming that annotations == pod/annotations.

@owais
Copy link
Contributor

owais commented May 20, 2021

@dmitryax agree. It also makes it easier to deal with config if we don't have to split and compare strings.

@pmatyjasek-sumo
Copy link
Contributor

I think the option 2 looks better. It is inline with for example pod_association convention

@tigrannajaryan
Copy link
Member

Option #2 looks good to me as a way to specify where from to extract annotations. Question: does extracted data get added as a Resource attribute or as a Span/LogRecord attribute?

@tigrannajaryan
Copy link
Member

What do we do with the additional well-know fields? Do they go under the existing "metadata" setting? e.g.:

    extract:
      metadata:
        # extract the following well-known metadata fields
        # these are pod fields:
        - podName
        - podUID
        - deployment
        - cluster
        - namespace
        - node
        - startTime
        # these are namespace fields:
        - namespaceName # is this the same as "namespace" above?
        - namespaceUID
        - namespaceCluster
        - namespaceStartTime

Do we believe that this single list of fields is sufficient for future needs? Do we expect field name clashes that make it impossible to disambiguate which entity the metadata should be extracted from? For example is "startTime" the pod start time or node start time (hypothetically speaking)?

@ericmustin
Copy link
Contributor

don't have super strong opinions here, but #2 seems a bit cleaner generally speaking

@pmm-sumo
Copy link
Contributor

No strong opinion here either. Option 2 seems cleaner but option 1 is less verbose.

Does extracted data get added as a Resource attribute or as a Span/LogRecord attribute?

I think that in principle, this processor should impact Resource-level attributes only and it seems it wasn't changed here. If attributes are on a record-level, they can be extracted level up using groupbyattrsprocessor

@chaitanyaphalak
Copy link
Contributor Author

chaitanyaphalak commented May 20, 2021

@tigrannajaryan These are added as resource attributes. "startTime" is distinct for all object types like pods, namespaces etc, so they all need to be distinct in some way. Also "namespace" and "namespaceName" are the same but extracted from different objects, pods, and namespace respectively.

@tigrannajaryan
Copy link
Member

Should we use Otel-compliant names for fields that we extract? So for example instead of:

    extract:
      metadata:
        # extract the following well-known metadata fields
        # these are pod fields:
        - podName
        - podUID
        - deployment
        - cluster
        - namespace
        - node
        - startTime

we do:

    extract:
      metadata:
        # extract the following well-known metadata fields
        # these are pod fields:
        - k8s.pod.name # this comes from a pod
        - k8s.pod.uid
        - k8s.deployment.name # where does this come from? from pod or the associated deployment? Or it is the same thing?
        - k8s.namespace.name # this comes from a namespace
        - k8s.pod.start_time # pod's start time.
        - k8s.node.start_time # node's start time.
        - startTime # alias for k8s.pod.start_time
        - etc ...

This way it is more apparent what is it that we are populating. This way we can also disambiguate fields which may be present in more than one entity. Thoughts?

I understand this means changing already existing field names, but we can do it in a backward compatible way by supporting old names as aliases for the fields that we start supporting new canonical Otel names.

@chaitanyaphalak
Copy link
Contributor Author

@tigrannajaryan that actually makes it really easy for the users to understand what they want and configure appropriately.

@owais
Copy link
Contributor

owais commented May 21, 2021

@tigrannajaryan makes sense for the "metadata" section which is static list of well-known fields. One implication of this is that if we need support for new fields, we'll first need to go and update the semantic conventions (which is probably a good thing anyway).

But I think this should be done in a separate PR.

@chaitanyaphalak this reminds me that metadata section is list of well-known fields and the processor knows exactly how and where to fetch these values from (ref). As such, I don't think it makes sense for users to specify a from: option to metadata fields. It still makes sense for Annotations and Labels though.

I think we should split this into multiple PRs:

  1. First PR updates the field names to what Tigran suggested. Let's not add any new fields. Only rename.
  2. (if applicable) PR adds new well-known fields that might possibly require us to fetch and watch namespace resources.
  3. Add from: support to annotations and labels so they can be extracted from namespaces in addition to pods.

If you only need or want to work on 3. let's create an issue to track what Tigran suggested. We should do it before GA.

WDYT?

@chaitanyaphalak
Copy link
Contributor Author

@owais I am cool working on all the 3 things along with adding the namespace metadata fields to the semantic conventions here -
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/k8s.md

@owais
Copy link
Contributor

owais commented May 21, 2021

@chaitanyaphalak One more thing. If you add support for new metadata fields, try to restrict them in a way so they can be extracted from the pod spec and won't require watching resources other than pods. If we do add support for some fields that can't be extracted from pods and need additional API calls, let's not turn them on by default.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 29, 2021
@tigrannajaryan
Copy link
Member

@owais @dmitryax PTAL.

Copy link
Contributor

@owais owais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Overall direction looks good. Took a quick look and left some comments. Will review again in detail in the morning.

processor/k8sprocessor/client_test.go Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
processor/k8sprocessor/kube/client.go Outdated Show resolved Hide resolved
Copy link
Contributor

@owais owais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I can't find where we are setting default value for "from" to "pod". If we aren't yet, let's add that.

processor/k8sprocessor/config.go Outdated Show resolved Hide resolved
@chaitanyaphalak
Copy link
Contributor Author

@owais The PR is ready.

@tigrannajaryan tigrannajaryan requested a review from owais June 14, 2021 16:26
Copy link
Contributor

@owais owais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but left 2 minor comments. I trust you to take care of those before getting it merged :)

Thanks for the contribution. great work!

processor/k8sprocessor/config_test.go Outdated Show resolved Hide resolved
processor/k8sprocessor/options.go Outdated Show resolved Hide resolved
Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the documentation in doc.go to show how the new functionality can be used.

@tigrannajaryan tigrannajaryan merged commit 5ca0538 into open-telemetry:main Jun 15, 2021
alexperez52 referenced this pull request in open-o11y/opentelemetry-collector-contrib Aug 18, 2021
Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants