Add resource to Recordable, make room for InstrumentationLibrary #580

jsuereth · 2021-02-12T21:48:06Z

This PR Is the result of discussion/design from #575.

Adds a SetResourceRef to the Recordable interface
Removes Tracer methods not in the specification.
Update Tracer to get Processor/Exporter solely from TracerProvider
Update Span to keep a reference to generating Tracer and pull information from it rather than duplicating
Ensure SetResourceRef is called on Recordable when creating spans
Update OTLP exporter to also export Resource
Update OTLP exporter tests to check contents of exported protos
Fix windows exporters/builds (ETW)

For a separate PR?:

Update SpanProcessor to be unique_ptr on TracerProvider.
Update Exporter to be unique_ptr on SpanProcessors.
Add InstrumentationLirbary class
Update TracerProvider to construct multiple Tracers, each remembering their InstrumentationLibrary

- Tracer now pulls its processor from TracerProvider (as per spec) - Remove `GetProcessor()` and friends from Tracer - Update all tests to reflect the pipeline is now on TracerProvider

- Remove passing of Processor/Resource to span - Spans now lookup processor from associated Tracer This should allow us to add InstrumenationLibrary to `Tracer` and expose lookups for `Resource`/`InstrumentationLibrary` from `Span`.

…instrumentation library.

codecov · 2021-02-12T21:56:39Z

Codecov Report

Merging #580 (a8be1ad) into main (3dfebc7) will increase coverage by 0.09%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #580      +/-   ##
==========================================
+ Coverage   94.39%   94.49%   +0.09%     
==========================================
  Files         200      200              
  Lines        9068     9066       -2     
==========================================
+ Hits         8560     8567       +7     
+ Misses        508      499       -9

Impacted Files	Coverage Δ
sdk/include/opentelemetry/sdk/trace/recordable.h	`100.00% <ø> (ø)`
.../include/opentelemetry/sdk/trace/tracer_provider.h	`100.00% <ø> (ø)`
sdk/src/trace/span.h	`100.00% <ø> (ø)`
sdk/test/trace/tracer_provider_test.cc	`100.00% <ø> (ø)`
...de/opentelemetry/ext/zpages/threadsafe_span_data.h	`97.05% <100.00%> (+0.04%)`	⬆️
ext/test/zpages/tracez_data_aggregator_test.cc	`97.36% <100.00%> (+0.01%)`	⬆️
ext/test/zpages/tracez_processor_test.cc	`98.72% <100.00%> (+<0.01%)`	⬆️
sdk/include/opentelemetry/sdk/trace/span_data.h	`100.00% <100.00%> (ø)`
sdk/include/opentelemetry/sdk/trace/tracer.h	`100.00% <100.00%> (ø)`
sdk/src/trace/span.cc	`89.24% <100.00%> (+0.11%)`	⬆️
... and 6 more

- OTLP exporter now correctly constructs a `Resource` for spans. - Expand OTLP tests to do a modicum of content validation.

lalitb

Thanks for implementing the approach. LGTM in general apart from comment about using reference instead of pointer for resources.

lalitb · 2021-02-16T08:19:57Z

exporters/otlp/src/otlp_exporter.cc

+    {
+      // *resource_span->mutable_resource() = std::move(rec->resource());
+      *resource_span->mutable_resource() = rec->resource();
+      has_resource                       = true;


nit: we can break from loop once resource is found.

Or we could assert the later found resource equals the first one.

We can't. I'm using the loop that adds the spans to add the resource. IF we break than we'll be dropping spans.

lalitb · 2021-02-16T09:45:28Z

sdk/include/opentelemetry/sdk/trace/recordable.h

+   * @param resource pointer to the Resource for this span.
+   */
+  virtual void SetResourceRef(
+      const opentelemetry::sdk::resource::Resource *const resource) noexcept = 0;


No strong preference, but can we pass const reference to resource here ?

So I can do this, it's a bit "odd". TL;DR: I didn't want to convert from a reference to a pointer as that causes me to cringe in the notion of "likely bug". I found a way to not do that, but it's still a bit cringey, PTAL.

sdk/include/opentelemetry/sdk/trace/tracer.h

pyohannes

Thanks, I generally like this approach. I'm just trying to think through its implications (some first thoughts in inline comments).

pyohannes · 2021-02-16T20:58:09Z

sdk/include/opentelemetry/sdk/trace/tracer.h

-  opentelemetry::sdk::AtomicSharedPtr<SpanProcessor> processor_;
-  const std::shared_ptr<Sampler> sampler_;
-  const opentelemetry::sdk::resource::Resource &resource_;
+  TracerProvider *provider_;


Why do we need this? We have the processor, the sampler, and the resource on the tracer itself.

I vaguely remember a use case for using a Tracer without a TracerProvider (I think it was related to envoy). I can try to dig that up. I think this design effectively makes that impossible.

(a) We can't enforce the lifetime of Processor at all if we share it across Tracer. One goal here is to give each component of the SDK an explicit owner who controls when memory is terminated
(b) TracerProvider owns resource AND processor/sampler. So the idea is that Tracers are tied to the lifetime of TracerProvider and can leverage both of those freely without sacrificing the "known lifetime" issue as you stated.

I'd love to know the Envoy use case. AFAIK you shouldn't be able to get a Tracer without a TracerProvider, but I could be wrong. I think Tracer need some work for InstrumentationLibrary, and likely that may push things this direction.

One advantage of not having dependency between the two is that the TracerProvider and Tracer can be implemented independently by developers, without worrying about the undocumented interfaces provided/to-be-provided by TracerProvider.

That was one of the discussions mentioning the Envoy use case: #35 (comment)

Currently we can use tracers without tracer providers, I think that's not violating the spec. I don't know how many use cases are out there for that (unfortunately I'm myself not that familiar with Envoy). However, if we remove that ability we'd need to be deliberate and careful about it.

Configuration (i.e., SpanProcessors, IdGenerator, SpanLimits and Sampler) MUST be managed solely by the TracerProvider and it MUST provide some way to configure all of them that are implemented in the SDK, at least when creating or initializing it.

See: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#tracer-creation

So the envoy concern in question is the following:

I could imagine users wanting to use a different ownership model than the TracerProvider that creates one tracer per named component and ties lifetime to a global.

I think having Tracer tie its lifetime shorter than a TracerProvider would still be possible here. Resource would be global, but tracer could have shorter lifetime. That's actually a significant API change we'd have to do though, and has tons of implicaitons around INstrumentationLirbary implementation.

pyohannes · 2021-02-16T21:02:06Z

sdk/include/opentelemetry/sdk/trace/recordable.h

+  /**
+   * Sets the resource associated with a span.
+   *
+   * <p> The resource is guaranteed to have a lifetime longer than this recordable.  It is


The resource is guaranteed to have a lifetime longer than this recordable.

I'm not sure if we can make this statement as is. I'm thinking about a scenario where the user frees the TracerProvider, but the exporter is still working on exporting spans (recordables), maybe in another thread.

If we go with plain pointers for resources (instead of shared_ptr), we'd need to be very careful, and either explicitly document all implicit assumptions we make, or take additional precautions to avoid problematic scenarios (e. g. making sure that in the TracerProvider destructor Shutdown is called for all exporters).

I'm planning to change TracerProvider to (a) ensure shutdown and (b) require resource cleanup after shutdown.

Would you be amenable to that change in correlation with this? I think it's needed to make this safe, and I'm happy to go update the docs.

I'm definitely amenable to you making those changes.

Some other question: did you for performance reasons decide not to use a managed pointer (std::shared_ptr) for holding on to the resource in the recordable? And rather have implicit ownership via trace provider/recordable lifetimes?

So far we tried to make all ownership explicit to be on the (memory) safe side.

lalitb · 2021-02-17T14:29:57Z

Thanks for implementing the approach. LGTM in general apart from comment about using reference instead of pointer for resources.

Ok thinking about the lifetime of resources, and scenarios where tracer provider is shutdown while the export is still happening, references mayn't be the right approach. So please ignore that suggestion.

ThomsonTan · 2021-02-17T17:02:59Z

exporters/otlp/src/otlp_exporter.cc

+    // We assume all the spans are for the same resource.
+    if (!has_resource)
+    {
+      // *resource_span->mutable_resource() = std::move(rec->resource());


nit: replace this comment with TODO support for moving?

Actually, theoretically it is moved, I should have removed it before :)

jsuereth · 2021-02-17T19:38:45Z

@lalitb I did implement references prior to seeing this.....

lalitb · 2021-02-18T13:17:35Z

@lalitb I did implement references prior to seeing this.....
Yes, I can see issues with reference, and agree not being the right approach.

jsuereth · 2021-02-18T14:28:34Z

@lalitb I can remove that commit. However two questions for the group:

If make the other proposed changes (Exporter/Processor being unique_ptrs on TracerProvider) does that help the reference concern?
How do we want to proceed with Tracer going forward? (@pyohannes raises good concerns around per-thread Tracer but I think this are orthogonal to whether or not we have a central owner for processing tracers)

pyohannes · 2021-02-18T19:00:59Z

If make the other proposed changes (Exporter/Processor being unique_ptrs on TracerProvider) does that help the reference concern?

My main concern about the approach (which I generally like):

The recordable has a pointer/reference to the resource which is owned by the tracer provider.
We pass ownership of the recordable to the exporter.
From that point on we don't have control over what happens with the recordable.

After that, the recordable might be batched by the exporter or another component it is forwarded to, at the same time the global tracer provider could be replaced (there's that possibility at least). To take care of that, we would need to come up with lots of implicit lifetime dependencies (like "shutting down an exporter invalidates all recordables created from it"), and we would need to make it a hard requirement for every exporter to implement Shutdown properly (currently it's a no-op for the OTLP exporter).

Maybe I'm too paranoid. However, what are your thoughts about this approach:

Storing the resource as a shared_ptr in the tracer provider.
Changing the signature of SetResourceRef to void SetResouce(const std::shared_ptr<Resource>& resource).
Then each exporter/recordable can itself decide whether to keep a shared pointer (100% safe), or go the optimized route and keep a pointer/reference. So we keep all options open for exporter/recordable implementations, and don't make any assumptions in the API about how resource ownership should be implemented.

lalitb · 2021-02-18T20:05:07Z

2. How do we want to proceed with Tracer going forward? (@pyohannes raises good concerns around per-thread Tracer but I think this are orthogonal to whether or not we have a central owner for processing tracers)

Probably I am may be biased with existing design and would like to be corrected if so - but how about keeping the current approach of keeping sampler, processor ( and now also including resource ) as shared_ptr within TracerProvider, and let their life-time governed by references to those shared_ptr. If this approach provides the loose-coupling between the TracerProvider and Tracer, of-course at the cost of performance overhead of using shared_ptr, and not so well-defined life-time for processor/exporter.

jsuereth · 2021-02-19T18:15:03Z

@pyohannes / @lalitb Two points:

I'm afraid that things like Add ForceFlush to Tracing SDK opentelemetry-specification#1452 will force us to define clear ownership + control of exporters/tracers. I'm ok updating this to use shared_ptr for flexibility now, but it has my hackles up that we may be begging peter to pay paul.
I'm starting to debate whether passing Resource to a `Recorable is the right thing after all. If you look at OTLP exporter, we basically have two requirements:

Split traces by TracerProviderand pull Resource once
Split traces by Tracer and pull InstrumentationLibrary once for the batch.
See the java implementation

We're not actually using Recordable in its intended way here for Resource or InstrumentationLibrary. I think it'd be more accurate to the other languages if we store Resource/InstrumentationLibrary on sdk::Span and update our Exporter API to take that.

a. We could make sure however we store it leads to efficient sorting/ordering in Exporters for the OTLP case (i.e. an efficient hash on Resource + InstrumentationLibrary
b. We wouldn't be forcing Exporters to all implement a way of "carrying around the Resource". They just need to use the reference we give them via Span.
c. We're not really leveraging the Recordable benefits for Resource today anyway. The idea was to avoid constructing
intermediate objects (which we are) and directly instantiate the runtime-output (e.g. protobuf).
d. I think this might be more uniform w/ other language implementations.

jsuereth · 2021-02-19T19:55:39Z

Regarding the concerns around using shared pointer, I think we have to change the existing implementation no matter what.

See the specificaiton line:

Configuration (i.e., SpanProcessors, IdGenerator, SpanLimits and Sampler) MUST be managed solely by the TracerProvider and it MUST provide some way to configure all of them that are implemented in the SDK, at least when creating or initializing it.
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#tracer-creation

ThomsonTan · 2021-02-20T00:17:21Z

sdk/src/trace/span.cc

    : tracer_{std::move(tracer)},
-      processor_{processor},
-      recordable_{processor_->MakeRecordable()},
+      recordable_{nullptr},


This initialization is unnecessary?

ThomsonTan · 2021-02-20T00:21:53Z

sdk/include/opentelemetry/sdk/trace/tracer.h

   */
-  std::shared_ptr<Sampler> GetSampler() const noexcept;
+  explicit Tracer(TracerProvider *provider) noexcept;


Could we make the ctor of Tracer private and only accessible only in TracerProvider?

pyohannes · 2021-02-22T22:51:52Z

Just another idea I had. What about putting all TracerProvider related state in a separate type, like TracerContext:

struct TracerContext {
  std::unique_ptr<SpanProcessor> processor;
  std::unique_ptr<Sampler> sampler;
  std::shared_ptr<Resource> resource;
};

The TracerProvider would hold a std::shared_ptr<TracerContext>, with setters to change processor and sampler.

The Tracers would also hold a std::shared_ptr<TracerContext>, which points to the one on the related TracerProvider. The InstrumentationLibrary would separately live on each tracer.

In this way, TracerProvider and Tracer would be decoupled, while still sharing state. Tracer could be used without TracerProvicer (by using another way to manage TracerContext).

I think it cleans up component dependencies in the SDK, however it doesn't give us an efficient way to pass Resource and InstrumentationLibrary along to the exporter. I think that's a separate problem, also related to the fact that the spec is poorly written in that specific regard.

lalitb · 2021-02-23T15:19:14Z

Just another idea I had. What about putting all TracerProvider related state in a separate type, like TracerContext:
struct TracerContext {
  std::unique_ptr<SpanProcessor> processor;
  std::unique_ptr<Sampler> sampler;
  std::shared_ptr<Resource> resource;
};
The TracerProvider would hold a std::shared_ptr<TracerContext>, with setters to change processor and sampler.

The Tracers would also hold a std::shared_ptr<TracerContext>, which points to the one on the related TracerProvider. The InstrumentationLibrary would separately live on each tracer.

In this way, TracerProvider and Tracer would be decoupled, while still sharing state. Tracer could be used without TracerProvicer (by using another way to manage TracerContext).

I think it cleans up component dependencies in the SDK, however it doesn't give us an efficient way to pass Resource and InstrumentationLibrary along to the exporter. I think that's a separate problem, also related to the fact that the spec is poorly written in that specific regard.

Thanks @pyohannes for the proposal - Just to be clear to me, the proposal is to have cleaner approach for removing the dependency between TracerProvider and Tracer and still sharing the state. It is not meant to ensure that configuration (processor, sampler) is solely owned by TracerProvider ?

lalitb · 2021-02-23T17:52:02Z

We're not actually using Recordable in its intended way here for Resource or InstrumentationLibrary. I think it'd be more accurate to the other languages if we store Resource/InstrumentationLibrary on sdk::Span and update our Exporter API to take that.

@jsuereth - I like this approach of using sdk::span to store the reference to Resource to have consistent implementation with other languages, just wondering how it would be different and efficient from current approach of storing reference to Resource in otlp Recordable. We would still need to iterate over spans/recordables to group by Resources/InstrumentationLib in otlp exporter? And not all exporters may want to process resources ( ? ), so in that case we would store it's reference in Span, but won't be used anywhere. Probably, I am not fully able to visualize this approach.

jsuereth · 2021-02-23T22:56:59Z

@pyohannes With the caveat that we do multithreading primitives correctly around updates, I think that could work for decoupling Tracer from TracerProvider, while still meeting the spec requirement around the processor.

Should I take a crack at this now as part of this CL?

jsuereth · 2021-02-23T22:59:55Z

@lalitb

We would still need to iterate over spans/recordables to group by Resources/InstrumentationLib in otlp exporter? And not all exporters may want to process resources ( ? ), so in that case we would store it's reference in Span, but won't be used anywhere. Probably, I am not fully able to visualize this approach

yeah, OTLP requires grouping by Resource/InstrumentationLibrary by its design. Alternatively we can store a map from IL => ILSpans and Resource => ResourceSpans to do that lookup/grouping as we iterate, depends on which ends up being faster.

I won't be able to get to showing this code until thursday, but I'll try to have something then for you to see what I'm suggesting. We can discuss in the SiG :)

pyohannes · 2021-02-24T01:18:55Z

Just to be clear to me, the proposal is to have cleaner approach for removing the dependency between TracerProvider and Tracer and still sharing the state. It is not meant to ensure that configuration (processor, sampler) is solely owned by TracerProvider ?

It ensures that the configuration is owned by a single TracerContext. If one uses a TracerProvider, the TracerProvider and the related Tracers will have shared ownership of the TraceContext (and the configuration in it), but it's only writable via the TracerProvider. If one doesn't use a TracerProvider, one can let any other (custom) component give ownership of the TraceContext and take care of managing updates to it.

pyohannes · 2021-02-24T01:20:40Z

@pyohannes With the caveat that we do multithreading primitives correctly around updates, I think that could work for decoupling Tracer from TracerProvider, while still meeting the spec requirement around the processor.

Should I take a crack at this now as part of this CL?

Maybe let's get other people's opinion first in tomorrow's SIG.

jsuereth added 4 commits February 12, 2021 14:35

Add SetResourceRef to trace Recordable interface.

ea6a0b8

Remove Tracer methods not in spec.

c37058b

- Tracer now pulls its processor from TracerProvider (as per spec) - Remove `GetProcessor()` and friends from Tracer - Update all tests to reflect the pipeline is now on TracerProvider

Ensure Resource references are passed into Recordables. Add TODO for …

9d9246c

…instrumentation library.

jsuereth mentioned this pull request Feb 12, 2021

[DO NOT MERGE - PLEASE REVIEW DESIGN] Feed Resource to Exporter #575

Closed

jsuereth added 3 commits February 12, 2021 17:38

Add helper method to otlp Recordable to create Resource proto.

ea996eb

Wire Resource into OTLP exporter.

474a0a5

- OTLP exporter now correctly constructs a `Resource` for spans. - Expand OTLP tests to do a modicum of content validation.

Run style formatter.

8528f64

jsuereth changed the title ~~[WIP] Add resource to Recordable, make room for InstrumentationLibrary~~ Add resource to Recordable, make room for InstrumentationLibrary Feb 12, 2021

jsuereth marked this pull request as ready for review February 12, 2021 23:39

jsuereth requested a review from a team February 12, 2021 23:39

lalitb reviewed Feb 16, 2021

View reviewed changes

pyohannes reviewed Feb 16, 2021

View reviewed changes

ThomsonTan reviewed Feb 17, 2021

View reviewed changes

Review comments. SetResourceRef now takes a ref.

34a7bf0

ThomsonTan reviewed Feb 20, 2021

View reviewed changes

lalitb mentioned this pull request Mar 1, 2021

Add Resources to Recordable interface. #570

Closed

lalitb mentioned this pull request Mar 19, 2021

Getting undefined reference to `opentelemetry::v0::sdk::resource::Resource::Create while all libraries are there #618

Closed

lalitb mentioned this pull request Apr 7, 2021

Create shared context for updating span pipeline from TracerProvider and affecting Tracer. #650

Merged

jsuereth mentioned this pull request Apr 7, 2021

Adding resource to TracerProvider #334

Closed

jsuereth closed this Apr 7, 2021

Add resource to Recordable, make room for InstrumentationLibrary #580

Add resource to Recordable, make room for InstrumentationLibrary #580

Conversation

jsuereth commented Feb 12, 2021 • edited Loading

codecov bot commented Feb 12, 2021 • edited Loading

Codecov Report

lalitb left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pyohannes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lalitb Feb 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lalitb commented Feb 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsuereth commented Feb 17, 2021

lalitb commented Feb 18, 2021

jsuereth commented Feb 18, 2021

pyohannes commented Feb 18, 2021

lalitb commented Feb 18, 2021

jsuereth commented Feb 19, 2021

jsuereth commented Feb 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pyohannes commented Feb 22, 2021

lalitb commented Feb 23, 2021 • edited Loading

lalitb commented Feb 23, 2021

jsuereth commented Feb 23, 2021

jsuereth commented Feb 23, 2021

pyohannes commented Feb 24, 2021

pyohannes commented Feb 24, 2021

jsuereth commented Feb 12, 2021 •

edited

Loading

codecov bot commented Feb 12, 2021 •

edited

Loading

lalitb left a comment •

edited

Loading

lalitb Feb 17, 2021 •

edited

Loading

lalitb commented Feb 23, 2021 •

edited

Loading