Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not clear to which messages a schema url applies to #4180

Open
lukasmalkmus opened this issue Aug 6, 2024 · 2 comments
Open

Not clear to which messages a schema url applies to #4180

lukasmalkmus opened this issue Aug 6, 2024 · 2 comments
Labels
spec:miscellaneous For issues that don't match any other spec label triage:accepted:ready

Comments

@lukasmalkmus
Copy link

lukasmalkmus commented Aug 6, 2024

Inside a single resource spans message, there are multiple occurrences of schema_url. As a consumer of OpenTelemetry data, I'm trying to understand to which attributes these schemas apply to. While this question applies to logs, traces and metrics, I'm going to reference the tracing related bits as this is where I initially stumbled across my problem.

  1. The ResourceSpans message has its own schema_url field: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L58-L63.

This schema_url applies to the data in the "resource" field. It does not apply to the data in the "scope_spans" field which have their own schema_url field.

This checks out.

  1. Following the comment going into the scope_spans field and looking at the ScopeSpans message, it also has its own schema_url field: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L76-L80.

This schema_url applies to all spans and span events in the "spans" field.

This kinda checks out but I'm missing two things, here:

  • It doesn't talk about links which I guess it must apply to, too?
  • It apparently doesn't apply to the scope which (a) seems inconsistent with how the ResourceSpans schema_url behaves (applying to resource) and (b) leaves me wondering in which schema the scope is recorded in

When I discovered this uncertainty, I followed the link in the source code comments which led me to this section. The ruleset established there doesn't seem to match with the comments in the proto file:

The schema_url field in the ResourceSpans [...] messages applies to the contained Resource, Span, SpanEvent [...] messages.

This clashes with the description given in the proto file.

The schema_url field in the InstrumentationLibrarySpans message applies to the contained Span and SpanEvent messages.

Checks out but with the same constraints I pointed out for the description in the proto file.

So without another explaining comment, I can't know for sure which field takes precedence as per this description they both cater to Span and SpanEvent.

Fortunately, there is an explanation there:

If schema_url field is non-empty both in Resource* message and in the contained InstrumentationLibrary* message then the value in InstrumentationLibrary* message takes the precedence.

But as @pellared already pointed out here:

this is most likely not correct

To stir more confusion, the link I mentioned has another section right at the top that reads as follows:

OpenTelemetry instrumentation libraries include the OpenTelemetry Schema URL in all emitted telemetry. This is currently work-in-progress, here is an example of how it is done in Go SDK’s Resource detectors.

As per @pellared:

this should have a different example from a instrumentation library and not a resource detector


To sum this issue up: As a consumer, it's not obvious to me on how to correctly process OTLP messages with regards to respecting the given schema urls. The docs as well as the spec seem to be outdated and/or wrong.

I'm happy to help rephrase some of the comments/docs I've linked but I need some guidance on what the correct way actually is before I get going.

Or maybe I've just navigated myself into a corner and it's totally obvious on how to process OTLP messages correctly. But I think in either case there are some takeaways from this issue.

Cheers

@lukasmalkmus lukasmalkmus added the spec:miscellaneous For issues that don't match any other spec label label Aug 6, 2024
@lukasmalkmus lukasmalkmus changed the title Not clear to which attributes a schema_url applies to Not clear to which messages a schema url applies to Aug 6, 2024
@trask
Copy link
Member

trask commented Aug 13, 2024

I'm happy to help rephrase some of the comments/docs I've linked but I need some guidance on what the correct way actually is before I get going.

yeah, it looks like there's some drift, and the spec needs to be updated a bit, if you can send a PR that would be a great way to get the ball rolling and get others to check it out!

@dyladan
Copy link
Member

dyladan commented Aug 13, 2024

To the best of my knowledge, there are the following errors:

  1. Instrumentation library was renamed to scope. Anything that says InstrumentationLibraryX like InstrumentationLibrarySpans should be ScopeX like ScopeSpans.

  2. The schema_url field in the ResourceSpans message DOES NOT apply to the contained Span and SpanEvent messages. The schema_url in ScopeSpans DOES.

  3. If schema_url field is non-empty both in Resource* message and in the contained InstrumentationLibrary* message then the value in InstrumentationLibrary* message takes the precedence.

    I think this is just outdated

Additionally:

It doesn't talk about links which I guess it must apply to, too?

A linked span is just the span context to help you find the span in your backend. That span should have a resource from whatever ScopeSpans message is associated with it. A link is just a pointer. The linked span may be in the same ScopeSpans message, but may also not be.

It apparently doesn't apply to the scope which (a) seems inconsistent with how the ResourceSpans schema_url behaves (applying to resource) and (b) leaves me wondering in which schema the scope is recorded in

It is a bit inconsistent in the way the data is laid out, but it just avoids duplication because all spans in a scope should have the same schema url. The resource only has one set of attributes, so the resource and schema_url can be at the same level in the tree. Each scope can have many spans with their own unique attributes and events, which in turn may have attributes. Because all spans and span attributes within a scope should have the same schema_url, the url is at the scope level of the tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:miscellaneous For issues that don't match any other spec label triage:accepted:ready
Projects
Status: No status
Development

No branches or pull requests

3 participants