Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should metric attributes be required to be namespaced? #51

Closed
trask opened this issue May 24, 2023 · 22 comments
Closed

Should metric attributes be required to be namespaced? #51

trask opened this issue May 24, 2023 · 22 comments
Assignees

Comments

@trask
Copy link
Member

trask commented May 24, 2023

Summarizing reasons for not introducing this requirement:

  • it's not necessary since metric attributes are already namespaced (by the metric they are on)
  • longer attribute names make writing backend metric queries more cumbersome

Summarizing reasons for introducing this requirement:

  • establishes that there is a single set of attributes that can be used across traces, metrics and logs
    • encourages correlation between signals, which is one of OTel's core missions
  • metrics can be derived from logs
  • raw measurements can be emitted via logs

I will update these summaries based on any comments made below.

@bertysentry
Copy link
Contributor

Additional cons:

  • Most major observability platforms don't use namespaces in metric attributes in their collectors (Prometheus, Grafana, Datadog, New Relic, for example)
  • Namespaces are complicated and people tend to add namespaces, even when not necessary, "just in case". Cf discussion of process.runtime.jvm.gc.name vs jvm.gc_name (the question of when to use . for a separate namespace, or _ for a qualifier of the preceding word is complicated)
  • Could lead to more fragmentation too, preventing easy correlation between metrics (example: we could have namespaces for different JVM implementations, and thus different attribute names, which prevent easy aggregation on jvm.hotspot.thread.name and jvm.j9.thread.name

Additional pros:

  • Namespaces help avoid conflict with different vendors implementations and interpretations of the standard and conventions

Thank you for tracking this, @trask!

@jsuereth
Copy link
Contributor

As an FYI - TC met and I'll be putting together a proposal (on this bug) on an "Option C".

The strawman is that all attributes are required to have namespaces. When an attribute namespace is the same as a "metric" namesapce, the attribute namespace will be dropped on the metric.

The idea here is we get the "best of both worlds" of reserved namespaces, global attribute registry, using the same "attribute bundle" for instrumentation across signals, but we also can produce metrics that look like the rest of the worlds metrics in addition to allowing tooling to understand a consistent set of rules when namespaces disappear.

I'll work on fleshing this out with concerns and details as a future comment. Once "Option C" is fleshed out, we'll be voting on a path forward.

@trask
Copy link
Member Author

trask commented May 26, 2023

I'm curious if "Option C" might influence some of our metric naming choices.

For example, system.processes.count has attribute state, which would implicitly become system.processes.state.

It could be better(?) in this case to name the metric system.process.count, and then the attribute state would implicitly become system.process.state.

@bertysentry
Copy link
Contributor

Having to choose for each attribute whether it should be namespaced or not will generate tons of discussions very difficult to arbitrate.

I propose an option D: Allow aliases for metric attributes.

Each attribute has a "principal" name, plus an array of aliases. When querying the metric, one could use the main name or any of the aliases.

First (naive) implementations could be to simply store each alias as a separate attribute.

Pros:

  • Otel metric attributes remain aligned with the rest of the world (with simple attribute names)
  • Allows correlation with logs and traces (with namespaced attributes).

Cons:

  • Significant change in metrics data format, I'm not competent enough to understand the implications of such a change.

@joaopgrassi
Copy link
Member

joaopgrassi commented Jun 5, 2023

Might be worth mentioning, I'm trying to move the existing metrics from markdown to yaml #73 and I bumped into what I think is a limitation in the tooling and/or violation of our recommendations

All names that are part of OpenTelemetry semantic conventions SHOULD be part of a namespace

Identical namespaces or names in all these areas MUST have identical meanings. For example the http.request.method span attribute name denotes exactly the same concept as the http.request.method metric attribute, has the same data type and the same set of possible values (in both cases it records the value of the HTTP protocol's request method as a string).

There's several attributes in the system metrics conventions today (e.g., state) that are used in several metrics and their values are not the same between metrics.

I started moving things into a YAML file using attribute_groups as done for the JVM metrics, but then the attributes get namespaced, such as system.cpu.state or system.memory.state, meaning it's not the same output as our current conventions.

Here's what I mean by using attribute_groups

- id: system.cpu
    prefix: system.cpu
    type: attribute_group
    brief: Describes System CPU metric attributes
    attributes:
      - id: state
        type:
          allow_custom_values: true
          members:
            - id: idle
              value: 'idle'
            - id: user
              value: 'user'
            - id: system
              value: 'system'
            - id: interrupt
              value: 'interrupt'
        brief: The state of the CPU
        examples: ["idle", "interrupt"]
        
 - id: system.memory
    prefix: system.memory
    type: attribute_group
    brief: Describes System Memory metric attributes
    attributes:
      - id: state
        type:
          allow_custom_values: true
          members:
            - id: used
              value: 'used'
            - id: free
              value: 'free'
            - id: cached
              value: 'cached'
        brief: The memory state
        examples: ["free", "cached"]

One might be able to get around the limitation by duplicating the attribute on each metric, changing it's enum values but that does not seem like a good idea to me. Essentially I can't share the attribute in different metrics without having it namespaced.

@Oberon00
Copy link
Member

Oberon00 commented Jun 5, 2023

The question is also: How should code generators name the constants for these? Usually I think they are based on the full name. And some metric attributes (like the ones shared with tracing) are indeed namespaced and unique. So would the code generators need to apply a heuristic like "does not contain dots" and then prefix with the group ID (which is otherwise only used for internal cross referencing and not usually exposed in generated code/markdon AFAIK)?

So it seems less a limitation of the tooling, but an ad-hoc concept accidentally introduced in current metric conventions for which it is unclear how it should be supported by tooling (if at all).

@jsuereth
Copy link
Contributor

jsuereth commented Jun 6, 2023

Option C: Metrics have implicit namespaces

This proposal requires all attributes in OpenTelemetry Semantic conventions to be namespaced, and available in a global registry. This provides many benefits, including:

  • Avoiding overlap between signals
  • Simplified code generation and tooling automation
  • Allowing sharing of attribute definitions between any signal in OpenTelemetry without the need for breaking changes.

Additionally, there is a desire from existing metric users for "small" or "short" attributes, as is the convention for metrics-only systems today. For example, rather than jvm.memory.state, metrics users would prefer an attribute in their backend to be just state

Metrics Implicit Namespace

To achieve this, Metrics will implicitly create a namespace based on their name. For example, a metric called process.cpu.utilization would implicitly have a namespace of process.cpu. Additionally we add the following rules:

  • When transmitting OTLP, any attribute not already namespaces (does not contain a .) is implicitly assumed to be in the metric's namespace.
  • When recording metric attributes, SDKs may drop the implicit namespace from attributes if it exists, e.g. process.cpu.state => state.
  • When exporting metrics to Prometheus, dropping the implicit namespace is required.
  • Backends MAY drop implicit namespaces from metric attributes.

Expected use cases

  • Instrumentation authors are able to generate one set of attributes where it makes sense (e.g. extracting HTTP attributes) and using them for all signals.
  • Metrics users are able to get short attribute names for key attribute types.
  • Metric attributes that become shared with traces/logs no longer need to be namespaced, so this would no longer be a breaking change.

@jack-berg
Copy link
Member

For the reasons mentioned here I still prefer requiring namespaces for all attributes.

I think the "option c" proposal above is a decent compromise, but that the benefits of attribute short names do not justify the complexity of the extra rules.

@jsuereth
Copy link
Contributor

jsuereth commented Jun 6, 2023

@jack-berg While I proposed Option C as a compromise, my own opinion aligns with yours.

@joaopgrassi
Copy link
Member

joaopgrassi commented Jun 9, 2023

To achieve this, Metrics will implicitly create a namespace based on their name. For example, a metric called process.cpu.utilization would implicitly have a namespace of process.cpu

Just if anyone wants to see, I have a draft PR to move the system metrics to YAML and before option C was proposed, it was the output I got with our current semconv tooling

https://github.com/dynatrace-oss-contrib/semantic-conventions/blob/feat/system-metrics-yaml/specification/metrics/semantic_conventions/system-metrics.md#metric-systemcputime

For ex: the system.cpu.* metrics have a state attribute. But system.memory.* and system.paging.* metrics also have that attribute.

The outcome is that the state attribute is created under those three namespaces (system.cpu.state, system.memory.state, system.paging.state)

Now a technicality I mentioned above: If we want Option C but also want the markdown rendering for the above attribute to be simply state, then I think we need changes in the tooling, because I couldn't find a way to do it. Maybe there is and I just don't know (which screams we need better docs for the semconv tool :D)

@trask
Copy link
Member Author

trask commented Jun 14, 2023

as discussed in the specification meeting, I believe this issue should be decided by the @open-telemetry/technical-committee since there are strong pros and cons to these approaches (including Option C outlined by @jsuereth), and we need a decision (one way or the other) before we can feature-freeze the JVM runtime metrics

@yurishkuro
Copy link
Member

yurishkuro commented Jun 15, 2023

I realize the following is not immediately actionable given the current state of OTEL. The fundamental issue we're dealing with here is that OTEL's semantic conventions bundle semantics with naming. Here's a contrive example of why this is bad: assume we want to capture events containing CPU measurements into an SQL table. Which table schema would you choose?

CREATE TABLE PROCESS_CPU (
  utilization INT NOT NULL,
  state       VARCHAR(50) NOT NULL,
  idle_time   INT NOT NULL
);

or

CREATE TABLE PROCESS_CPU (
  process_cpu_utilization INT NOT NULL,
  process_cpu_state       VARCHAR(50) NOT NULL,
  process_cpu_idle_time   INT NOT NULL
);

I think it's safe to say most engineers would pick the first option (the basic DRY principle; Go linter even has a similar anti-stutter rule). But then how do we distinguish if the state columns in this table and in a different table refer to the same thing or not? OTEL semantic conventions cannot help any more at this point, but the solution does exist - by capturing additional metadata about the columns in the table schema. I went into details of such approach in my talk last year https://www.shkuro.com/talks/2022-10-27-schema-first-application-telemetry/

@bertysentry
Copy link
Contributor

I'm with @yurishkuro on this. I was suggesting to allow aliases for attributes names, and such aliases could be part of such attributes metadata.

In terms of implementation though, it's a bit tricky. Developers who are implementing OpenTelemetry to instrument their application and emit metrics would need to follow strictly the semantic conventions that specify such aliases.

For example, suppose the semantic conventions for system.cpu commend to include system.cpu.state AND its alias state. When emitting the system.cpu.utilization metric, the developer would need to make sure to include not just the "state" attribute, but the ["state", "system.cpu.state"] attribute.

@tigrannajaryan
Copy link
Member

A TC vote was requested on this, however it is not clear to me what the option "require namespace" is. What namespace? Any namespace? Is it OK to place all attributes in namespace all.? What really is the requirement? That all attributes are spread into more than one namespace? How many? Should namespaces in some way match the metric name namespace?

Are we instead perhaps trying to say that a metric attribute considered in isolation from the metric should have a meaningful name, such that the interpretation of that meaning does not require additional knowledge of a context in which it is used (i.e. the metric name)? So, state does not match this requirement (state of what?). cpu.state probably matches the requirement since we know what a CPU is and can define what a state of a CPU is.

Let's look at another example. We currently have a cpu attribute in system.cpu.time metric. What does cpu attribute describe? It is supposed to be the integer number of the cpu. Arguably, probably a clearer name would be cpu_number. Is this good enough? Seems like a namespace is superfluous in this case. Should we still place this in system.cpu. namespace so that it becomes system.cpu.cpu_number?

My suggested definition is of course still ambiguous. Let's look at another example: a port. We know what a port is right? It's that number that is associated with a network socket (to be clearer, we would probably want to call this port_number instead of port). So, no other context needed? No, it turns out it can be a client port or a server port. So, server.port is namespaced enough? But what if we decide we want for all http connections to record 4 port numbers: the client, the server, and the incoming port on the proxy and outgoing port on the proxy. Should we use client.src.port, proxy.dest.port, proxy.src.port, server.dest.port? What's the framework for making a decision about what namespace (how deeply nested) to use?

I find that the definition of "require namespace" rule is lacking the necessary precision to apply it. It would be great if you can come up with a proper definition (or maybe there is one that I do not see, please point me to it). Without such precise definition I don't think that option is ready for the TC to vote on.

@jack-berg
Copy link
Member

@tigrannajaryan from my perspective the question is merely whether metric attributes are exempt from the namespacing requirements we already impose on attributes elsewhere.

Technically, the attribute naming spec is stable and is quite clear on this already. It starts with:

This section applies to Resource, Span, Log, and Metric attribute names (also known as the "attribute keys"). For brevity within this section when we use the term "name" without an adjective it is implied to mean "attribute name".

Then follows with:

Names SHOULD follow these rules:
Use namespacing to avoid name clashes.

Put simply, option A just reinforces what the spec already says. The other options aim to carve out some sort of exception for metrics. I don't think we need more precision with the attribute namespacing rule (at this moment in time) since trace and log semantic convention conversations have been able to proceed without it.

@tigrannajaryan
Copy link
Member

Thanks @jack-berg

That's very useful.

I think we need clarify what we consider a clash, i.e. a clash with what? It appears that another way to think about "no exception for metrics" approach is to say that we want globally unique attribute names across all signals, such that no metric attribute can be the same as a span attribute or a log attribute unless the attribute has the exact same meaning. I would expect that we make this clarification in the spec should we decide that yes, we would like metric attributes to "require a namespace".

@jack-berg
Copy link
Member

The TC has voted on this issue with results as follows:

  • 4 votes for "Option A: Metric attributes must have namespaces"
  • 0 votes for "Option B: Metric attributes don't need namespaces since attributes can be considered scoped to the metric name"
  • 3 votes for "Option C: Metrics have implicit namespaces"
  • 2 members abstained

Therefore, we should proceed with: "Option A: Metric attributes must have namespaces"

Closing this issue and we can separately proceed with the changes required to adopt (e.g. #3431, semantic-conventions#61)

@bertysentry
Copy link
Contributor

Thank you for organizing the vote on this issue! Can we keep the discussion going on other options? @yurishkuro mentioned using metadata to convey this information. I suggested an option D, allowing aliases for attributes. But these were discarded in the discussion 😥

@tigrannajaryan
Copy link
Member

Can we keep the discussion going on other options?

TC votes are for decision making. The decision is made to go with "Option A: Metric attributes must have namespaces".

@jsuereth
Copy link
Contributor

@bertysentry I don't think selecting Option A negates the possibility of something like an Option C or D appearing in the future. However, the details of Option D are too open ended to consider voting. As you saw, Option C vs. Option A was a close call, but the fundamental agreement was that from a Semantic Conventions standpoint, metric attributes will have namespaces and unified descriptions/meaning between signals.

On a metric-backend-specific case, allowing aliases for well-known metric attributes is totally fine, but outside the scope of OpenTelmetry. Users are also welcome to shorten metric names via processors if it is their desire.

@bertysentry
Copy link
Contributor

Thank you, @jsuereth, for the reply. I'm glad there is still room for discussion on improving this. IMO the required namespace in metric attributes is doomed to failure (for various reasons already detailed before). I'm concerned that too few implementations will follow these strict conventions, and that's why I truly hope enhancements will be considered, to ensure a wide adoption of these conventions. 🙏

@Oberon00
Copy link
Member

I think the main decision here was that they are conceptually namespaced and that's how we will describe them in semantic conventions. On this basis, any transmission optimizations can still be built.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants