Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource to Recordable, make room for InstrumentationLibrary #580

Closed

Conversation

jsuereth
Copy link
Contributor

@jsuereth jsuereth commented Feb 12, 2021

This PR Is the result of discussion/design from #575.

  • Adds a SetResourceRef to the Recordable interface
  • Removes Tracer methods not in the specification.
  • Update Tracer to get Processor/Exporter solely from TracerProvider
  • Update Span to keep a reference to generating Tracer and pull information from it rather than duplicating
  • Ensure SetResourceRef is called on Recordable when creating spans
  • Update OTLP exporter to also export Resource
  • Update OTLP exporter tests to check contents of exported protos
  • Fix windows exporters/builds (ETW)

For a separate PR?:

  • Update SpanProcessor to be unique_ptr on TracerProvider.
  • Update Exporter to be unique_ptr on SpanProcessors.
  • Add InstrumentationLirbary class
  • Update TracerProvider to construct multiple Tracers, each remembering their InstrumentationLibrary

- Tracer now pulls its processor from TracerProvider (as per spec)
- Remove `GetProcessor()` and friends from Tracer
- Update all tests to reflect the pipeline is now on TracerProvider
- Remove passing of Processor/Resource to span
- Spans now lookup processor from associated Tracer

This should allow us to add InstrumenationLibrary to `Tracer` and
expose lookups for `Resource`/`InstrumentationLibrary` from `Span`.
@codecov
Copy link

codecov bot commented Feb 12, 2021

Codecov Report

Merging #580 (a8be1ad) into main (3dfebc7) will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #580      +/-   ##
==========================================
+ Coverage   94.39%   94.49%   +0.09%     
==========================================
  Files         200      200              
  Lines        9068     9066       -2     
==========================================
+ Hits         8560     8567       +7     
+ Misses        508      499       -9     
Impacted Files Coverage Δ
sdk/include/opentelemetry/sdk/trace/recordable.h 100.00% <ø> (ø)
.../include/opentelemetry/sdk/trace/tracer_provider.h 100.00% <ø> (ø)
sdk/src/trace/span.h 100.00% <ø> (ø)
sdk/test/trace/tracer_provider_test.cc 100.00% <ø> (ø)
...de/opentelemetry/ext/zpages/threadsafe_span_data.h 97.05% <100.00%> (+0.04%) ⬆️
ext/test/zpages/tracez_data_aggregator_test.cc 97.36% <100.00%> (+0.01%) ⬆️
ext/test/zpages/tracez_processor_test.cc 98.72% <100.00%> (+<0.01%) ⬆️
sdk/include/opentelemetry/sdk/trace/span_data.h 100.00% <100.00%> (ø)
sdk/include/opentelemetry/sdk/trace/tracer.h 100.00% <100.00%> (ø)
sdk/src/trace/span.cc 89.24% <100.00%> (+0.11%) ⬆️
... and 6 more

- OTLP exporter now correctly constructs a `Resource` for spans.
- Expand OTLP tests to do a modicum of content validation.
@jsuereth jsuereth changed the title [WIP] Add resource to Recordable, make room for InstrumentationLibrary Add resource to Recordable, make room for InstrumentationLibrary Feb 12, 2021
@jsuereth jsuereth marked this pull request as ready for review February 12, 2021 23:39
@jsuereth jsuereth requested a review from a team February 12, 2021 23:39
Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the approach. LGTM in general apart from comment about using reference instead of pointer for resources.

{
// *resource_span->mutable_resource() = std::move(rec->resource());
*resource_span->mutable_resource() = rec->resource();
has_resource = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can break from loop once resource is found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could assert the later found resource equals the first one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't. I'm using the loop that adds the spans to add the resource. IF we break than we'll be dropping spans.

* @param resource pointer to the Resource for this span.
*/
virtual void SetResourceRef(
const opentelemetry::sdk::resource::Resource *const resource) noexcept = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong preference, but can we pass const reference to resource here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can do this, it's a bit "odd". TL;DR: I didn't want to convert from a reference to a pointer as that causes me to cringe in the notion of "likely bug". I found a way to not do that, but it's still a bit cringey, PTAL.

sdk/include/opentelemetry/sdk/trace/tracer.h Show resolved Hide resolved
Copy link
Contributor

@pyohannes pyohannes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I generally like this approach. I'm just trying to think through its implications (some first thoughts in inline comments).

opentelemetry::sdk::AtomicSharedPtr<SpanProcessor> processor_;
const std::shared_ptr<Sampler> sampler_;
const opentelemetry::sdk::resource::Resource &resource_;
TracerProvider *provider_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? We have the processor, the sampler, and the resource on the tracer itself.

I vaguely remember a use case for using a Tracer without a TracerProvider (I think it was related to envoy). I can try to dig that up. I think this design effectively makes that impossible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) We can't enforce the lifetime of Processor at all if we share it across Tracer. One goal here is to give each component of the SDK an explicit owner who controls when memory is terminated
(b) TracerProvider owns resource AND processor/sampler. So the idea is that Tracers are tied to the lifetime of TracerProvider and can leverage both of those freely without sacrificing the "known lifetime" issue as you stated.

I'd love to know the Envoy use case. AFAIK you shouldn't be able to get a Tracer without a TracerProvider, but I could be wrong. I think Tracer need some work for InstrumentationLibrary, and likely that may push things this direction.

Copy link
Member

@lalitb lalitb Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of not having dependency between the two is that the TracerProvider and Tracer can be implemented independently by developers, without worrying about the undocumented interfaces provided/to-be-provided by TracerProvider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was one of the discussions mentioning the Envoy use case: #35 (comment)

Currently we can use tracers without tracer providers, I think that's not violating the spec. I don't know how many use cases are out there for that (unfortunately I'm myself not that familiar with Envoy). However, if we remove that ability we'd need to be deliberate and careful about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration (i.e., SpanProcessors, IdGenerator, SpanLimits and Sampler) MUST be managed solely by the TracerProvider and it MUST provide some way to configure all of them that are implemented in the SDK, at least when creating or initializing it.

See: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#tracer-creation

So the envoy concern in question is the following:

I could imagine users wanting to use a different ownership model than the TracerProvider that creates one tracer per named component and ties lifetime to a global.

I think having Tracer tie its lifetime shorter than a TracerProvider would still be possible here. Resource would be global, but tracer could have shorter lifetime. That's actually a significant API change we'd have to do though, and has tons of implicaitons around INstrumentationLirbary implementation.

/**
* Sets the resource associated with a span.
*
* <p> The resource is guaranteed to have a lifetime longer than this recordable. It is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource is guaranteed to have a lifetime longer than this recordable.

I'm not sure if we can make this statement as is. I'm thinking about a scenario where the user frees the TracerProvider, but the exporter is still working on exporting spans (recordables), maybe in another thread.

If we go with plain pointers for resources (instead of shared_ptr), we'd need to be very careful, and either explicitly document all implicit assumptions we make, or take additional precautions to avoid problematic scenarios (e. g. making sure that in the TracerProvider destructor Shutdown is called for all exporters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to change TracerProvider to (a) ensure shutdown and (b) require resource cleanup after shutdown.

Would you be amenable to that change in correlation with this? I think it's needed to make this safe, and I'm happy to go update the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely amenable to you making those changes.

Some other question: did you for performance reasons decide not to use a managed pointer (std::shared_ptr) for holding on to the resource in the recordable? And rather have implicit ownership via trace provider/recordable lifetimes?

So far we tried to make all ownership explicit to be on the (memory) safe side.

@lalitb
Copy link
Member

lalitb commented Feb 17, 2021

Thanks for implementing the approach. LGTM in general apart from comment about using reference instead of pointer for resources.

Ok thinking about the lifetime of resources, and scenarios where tracer provider is shutdown while the export is still happening, references mayn't be the right approach. So please ignore that suggestion.

// We assume all the spans are for the same resource.
if (!has_resource)
{
// *resource_span->mutable_resource() = std::move(rec->resource());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replace this comment with TODO support for moving?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, theoretically it is moved, I should have removed it before :)

@jsuereth
Copy link
Contributor Author

@lalitb I did implement references prior to seeing this.....

@lalitb
Copy link
Member

lalitb commented Feb 18, 2021

@lalitb I did implement references prior to seeing this.....
Yes, I can see issues with reference, and agree not being the right approach.

@jsuereth
Copy link
Contributor Author

@lalitb I can remove that commit. However two questions for the group:

  1. If make the other proposed changes (Exporter/Processor being unique_ptrs on TracerProvider) does that help the reference concern?
  2. How do we want to proceed with Tracer going forward? (@pyohannes raises good concerns around per-thread Tracer but I think this are orthogonal to whether or not we have a central owner for processing tracers)

@pyohannes
Copy link
Contributor

If make the other proposed changes (Exporter/Processor being unique_ptrs on TracerProvider) does that help the reference concern?

My main concern about the approach (which I generally like):

  1. The recordable has a pointer/reference to the resource which is owned by the tracer provider.
  2. We pass ownership of the recordable to the exporter.
  3. From that point on we don't have control over what happens with the recordable.

After that, the recordable might be batched by the exporter or another component it is forwarded to, at the same time the global tracer provider could be replaced (there's that possibility at least). To take care of that, we would need to come up with lots of implicit lifetime dependencies (like "shutting down an exporter invalidates all recordables created from it"), and we would need to make it a hard requirement for every exporter to implement Shutdown properly (currently it's a no-op for the OTLP exporter).

Maybe I'm too paranoid. However, what are your thoughts about this approach:

  1. Storing the resource as a shared_ptr in the tracer provider.
  2. Changing the signature of SetResourceRef to void SetResouce(const std::shared_ptr<Resource>& resource).
  3. Then each exporter/recordable can itself decide whether to keep a shared pointer (100% safe), or go the optimized route and keep a pointer/reference. So we keep all options open for exporter/recordable implementations, and don't make any assumptions in the API about how resource ownership should be implemented.

@lalitb
Copy link
Member

lalitb commented Feb 18, 2021

2. How do we want to proceed with Tracer going forward? (@pyohannes raises good concerns around per-thread Tracer but I think this are orthogonal to whether or not we have a central owner for processing tracers)

Probably I am may be biased with existing design and would like to be corrected if so - but how about keeping the current approach of keeping sampler, processor ( and now also including resource ) as shared_ptr within TracerProvider, and let their life-time governed by references to those shared_ptr. If this approach provides the loose-coupling between the TracerProvider and Tracer, of-course at the cost of performance overhead of using shared_ptr, and not so well-defined life-time for processor/exporter.

@jsuereth
Copy link
Contributor Author

@pyohannes / @lalitb Two points:

  1. I'm afraid that things like Add ForceFlush to Tracing SDK opentelemetry-specification#1452 will force us to define clear ownership + control of exporters/tracers. I'm ok updating this to use shared_ptr for flexibility now, but it has my hackles up that we may be begging peter to pay paul.
  2. I'm starting to debate whether passing Resource to a `Recorable is the right thing after all. If you look at OTLP exporter, we basically have two requirements:
  • Split traces by TracerProviderand pull Resource once
  • Split traces by Tracer and pull InstrumentationLibrary once for the batch.
  • See the java implementation

We're not actually using Recordable in its intended way here for Resource or InstrumentationLibrary. I think it'd be more accurate to the other languages if we store Resource/InstrumentationLibrary on sdk::Span and update our Exporter API to take that.

a. We could make sure however we store it leads to efficient sorting/ordering in Exporters for the OTLP case (i.e. an efficient hash on Resource + InstrumentationLibrary
b. We wouldn't be forcing Exporters to all implement a way of "carrying around the Resource". They just need to use the reference we give them via Span.
c. We're not really leveraging the Recordable benefits for Resource today anyway. The idea was to avoid constructing
intermediate objects (which we are) and directly instantiate the runtime-output (e.g. protobuf).
d. I think this might be more uniform w/ other language implementations.

@jsuereth
Copy link
Contributor Author

Regarding the concerns around using shared pointer, I think we have to change the existing implementation no matter what.

See the specificaiton line:

Configuration (i.e., SpanProcessors, IdGenerator, SpanLimits and Sampler) MUST be managed solely by the TracerProvider and it MUST provide some way to configure all of them that are implemented in the SDK, at least when creating or initializing it.
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#tracer-creation

: tracer_{std::move(tracer)},
processor_{processor},
recordable_{processor_->MakeRecordable()},
recordable_{nullptr},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initialization is unnecessary?

*/
std::shared_ptr<Sampler> GetSampler() const noexcept;
explicit Tracer(TracerProvider *provider) noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the ctor of Tracer private and only accessible only in TracerProvider?

@pyohannes
Copy link
Contributor

Just another idea I had. What about putting all TracerProvider related state in a separate type, like TracerContext:

struct TracerContext {
  std::unique_ptr<SpanProcessor> processor;
  std::unique_ptr<Sampler> sampler;
  std::shared_ptr<Resource> resource;
};

The TracerProvider would hold a std::shared_ptr<TracerContext>, with setters to change processor and sampler.

The Tracers would also hold a std::shared_ptr<TracerContext>, which points to the one on the related TracerProvider. The InstrumentationLibrary would separately live on each tracer.

In this way, TracerProvider and Tracer would be decoupled, while still sharing state. Tracer could be used without TracerProvicer (by using another way to manage TracerContext).

I think it cleans up component dependencies in the SDK, however it doesn't give us an efficient way to pass Resource and InstrumentationLibrary along to the exporter. I think that's a separate problem, also related to the fact that the spec is poorly written in that specific regard.

@lalitb
Copy link
Member

lalitb commented Feb 23, 2021

Just another idea I had. What about putting all TracerProvider related state in a separate type, like TracerContext:

struct TracerContext {
  std::unique_ptr<SpanProcessor> processor;
  std::unique_ptr<Sampler> sampler;
  std::shared_ptr<Resource> resource;
};

The TracerProvider would hold a std::shared_ptr<TracerContext>, with setters to change processor and sampler.

The Tracers would also hold a std::shared_ptr<TracerContext>, which points to the one on the related TracerProvider. The InstrumentationLibrary would separately live on each tracer.

In this way, TracerProvider and Tracer would be decoupled, while still sharing state. Tracer could be used without TracerProvicer (by using another way to manage TracerContext).

I think it cleans up component dependencies in the SDK, however it doesn't give us an efficient way to pass Resource and InstrumentationLibrary along to the exporter. I think that's a separate problem, also related to the fact that the spec is poorly written in that specific regard.

Thanks @pyohannes for the proposal - Just to be clear to me, the proposal is to have cleaner approach for removing the dependency between TracerProvider and Tracer and still sharing the state. It is not meant to ensure that configuration (processor, sampler) is solely owned by TracerProvider ?

@lalitb
Copy link
Member

lalitb commented Feb 23, 2021

We're not actually using Recordable in its intended way here for Resource or InstrumentationLibrary. I think it'd be more accurate to the other languages if we store Resource/InstrumentationLibrary on sdk::Span and update our Exporter API to take that.

@jsuereth - I like this approach of using sdk::span to store the reference to Resource to have consistent implementation with other languages, just wondering how it would be different and efficient from current approach of storing reference to Resource in otlp Recordable. We would still need to iterate over spans/recordables to group by Resources/InstrumentationLib in otlp exporter? And not all exporters may want to process resources ( ? ), so in that case we would store it's reference in Span, but won't be used anywhere. Probably, I am not fully able to visualize this approach.

@jsuereth
Copy link
Contributor Author

@pyohannes With the caveat that we do multithreading primitives correctly around updates, I think that could work for decoupling Tracer from TracerProvider, while still meeting the spec requirement around the processor.

Should I take a crack at this now as part of this CL?

@jsuereth
Copy link
Contributor Author

@lalitb

We would still need to iterate over spans/recordables to group by Resources/InstrumentationLib in otlp exporter? And not all exporters may want to process resources ( ? ), so in that case we would store it's reference in Span, but won't be used anywhere. Probably, I am not fully able to visualize this approach

yeah, OTLP requires grouping by Resource/InstrumentationLibrary by its design. Alternatively we can store a map from IL => ILSpans and Resource => ResourceSpans to do that lookup/grouping as we iterate, depends on which ends up being faster.

I won't be able to get to showing this code until thursday, but I'll try to have something then for you to see what I'm suggesting. We can discuss in the SiG :)

@pyohannes
Copy link
Contributor

Just to be clear to me, the proposal is to have cleaner approach for removing the dependency between TracerProvider and Tracer and still sharing the state. It is not meant to ensure that configuration (processor, sampler) is solely owned by TracerProvider ?

It ensures that the configuration is owned by a single TracerContext. If one uses a TracerProvider, the TracerProvider and the related Tracers will have shared ownership of the TraceContext (and the configuration in it), but it's only writable via the TracerProvider. If one doesn't use a TracerProvider, one can let any other (custom) component give ownership of the TraceContext and take care of managing updates to it.

@pyohannes
Copy link
Contributor

@pyohannes With the caveat that we do multithreading primitives correctly around updates, I think that could work for decoupling Tracer from TracerProvider, while still meeting the spec requirement around the processor.

Should I take a crack at this now as part of this CL?

Maybe let's get other people's opinion first in tomorrow's SIG.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants