Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Shutdown() method to *Provider #1073

Closed
XSAM opened this issue Oct 8, 2020 · 16 comments · Fixed by #1074
Closed

Add a Shutdown() method to *Provider #1073

XSAM opened this issue Oct 8, 2020 · 16 comments · Fixed by #1074
Assignees
Labels
area:api Cross language API specification issue priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory spec:trace Related to the specification/trace directory

Comments

@XSAM
Copy link
Member

XSAM commented Oct 8, 2020

What are you trying to achieve?

Add a Shutdown() method to TracerProvider and MeterProvider, which would invoke Shutdown() on all processors.

Additional context.

There is no convenient way to let our users shut the processors down if their programs want to exit. It is the users' response to keep Shutdown functions from processors. Telemetry API should handle this common scenario, which every program does, to make sure spans/metrics are written before exit.

Related open-telemetry/opentelemetry-go#1206

@XSAM XSAM added the spec:miscellaneous For issues that don't match any other spec label label Oct 8, 2020
@anuraaga
Copy link
Contributor

anuraaga commented Oct 8, 2020

Hi @XSAM - we currently define shutdown on the OpenTelemetry SDK itself

https://github.com/open-telemetry/opentelemetry-specification/blob/59fe1e74872d9159a2ec16249420ad4f044f7ee8/specification/trace/sdk.md#shutdown

So a call like OpenTelemetrySdk.shutdown() can be used to cleanly shutdown, exporting pending metrics. Does this work for you or do you need to shutdown the tracer and metric separately? If so, could you describe your use case more for shutting them down separately? Thanks!

@XSAM
Copy link
Member Author

XSAM commented Oct 8, 2020

@anuraaga From what I understand, the Shutdown function in the link is just for processors.

And I want to propose a Shutdown function for *Provider. Take Go interface as an example:

type TracerProvider interface {
	Tracer(instrumentationName string, opts ...TracerOption) Tracer
	Shutdown(ctx context.Context) error
}

type MeterProvider interface {
	Meter(instrumentationName string, opts ...MeterOption) Meter
	Shutdown(ctx context.Context) error
}

*Provider holds all the processors, and it will pass processors to the tracer internally. And only the processor exposes the Shutdown function. Therefore, users have to save instances of the processor if they want to close it before exiting.

Since *Provider already holds processors, I believe add a Shutdown function to *Provider is an easier way to close all processors.

@Oberon00 Oberon00 added area:api Cross language API specification issue spec:metrics Related to the specification/metrics directory spec:trace Related to the specification/trace directory and removed spec:miscellaneous For issues that don't match any other spec label labels Oct 8, 2020
@anuraaga
Copy link
Contributor

anuraaga commented Oct 8, 2020

@XSAM Ah sorry I misread the spec. Actually we already have it implemented as such in Java for tracing but didn't notice we never added it to the spec. Agree, we should

https://github.com/open-telemetry/opentelemetry-java/blob/master/sdk/tracing/src/main/java/io/opentelemetry/sdk/trace/TracerSdkManagement.java#L58

@Oberon00
Copy link
Member

Oberon00 commented Oct 9, 2020

Same with ForceFlush IMHO.

@andrewhsu andrewhsu added priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA labels Oct 9, 2020
@bogdandrutu
Copy link
Member

From the triage meeting, we need to ensure that we should or not have in the API.

@bogdandrutu
Copy link
Member

My proposal on this: We don't need this in the API package, this is an implementation detail, here are some arguments:

  1. Shutdown must be called once only by the same person that configures the implementation as well.
  2. Instrumentation devs don't need to call Shutdown.

Adding this later is backwards compatible, so for the moment I am encouraging to not have it in the API for the moment.

@seh
Copy link

seh commented Oct 9, 2020

Adding this later is backwards compatible

Not for Go implementations, if you intend to add a method to a published interface.

@Oberon00
Copy link
Member

Oberon00 commented Oct 9, 2020

You can always derive a new interface from the old one. It doesn't look very pretty but that's how APIs that have been kept backwards compatible for long tend to look like. Cf. Win32/COM https://docs.microsoft.com/en-us/windows/win32/api/shobjidl_core/nn-shobjidl_core-itaskbarlist4

@bogdandrutu
Copy link
Member

@seh

Not for Go implementations, if you intend to add a method to a published interface.

The same problem is everywhere about any interface (even in Java). But what we assume will be backwards compatible will be the usage of the interface not implementations of the interface. So code that is instrumented with this will continue to work, any implementation (for testing purpose, or production implementation) indeed will have to update.

@seh
Copy link

seh commented Oct 10, 2020

Instrumentation devs don't need to call Shutdown.

Something missed here, @bogdandrutu, wasn't called out explicitly in open-telemetry/opentelemetry-go#1206, but I ran into in the last week myself as an "instrumentation dev," just looking to use the library, and not to implement it.

The otel/skd/trace package offers the WithBatcher function, to ease adding batching to a TraceProvider. However, WithBatcher calls on NewBatchSpanProcessor, burying a value that later needs be cleaned up by calling its Shutdown method.

Resolving that problem today requires that callers eschew WithBatcher, and instead create the BatchSpanProcessor first, and stuff it into the TraceProvider via WithSpanProcessor. Please see open-telemetry/opentelemetry-go#1199 for this remediation. Now even the so-called "basic" example of how to use the library as a client can't use the convenience functions offered by the library.

The question, then, is whether offering WithBatcher is a mistake, and we should remove it, or whether we should embrace what it's trying to offer and allow a TraceProvider to clean up such components of which it takes ownership.

I understand that this issue concerns the specification, and not its realization in Go, but we came here by way of this problem in the Go library.

@bogdandrutu
Copy link
Member

@seh you are mentioning a SDK (otel/sdk/trace) functionality, so based on my understanding this is not a requirement for the otel/api, but instead a requirement for the SDK to provide this, which is fine.

@bogdandrutu
Copy link
Member

sorry, pressed close by mistake.

@Aneurysm9
Copy link
Member

I think the TracerProvider straddles the boundary between API and SDK and has behavior specified in both sections of the specification. It is probably the case that this behavior relates mostly to the SDK portion of its behavior as it is functionality that would be used by operators and is not likely to be used by instrumentations. Since the trace SDK specification discusses requirements for creation and initialization of TracerProviders and SpanProcessors, should it also include requirements for the other end of the lifecycle? I think this may be as a simple as stating that a TracerProvider implementation should (or must) provide a Shutdown() method to invoke the already-defined Shutdown() method on all of its configured SpanProcessors.

@seh
Copy link

seh commented Oct 12, 2020

As is often the case with resource cleanup, that also brings up the subject of resource ownership. Does handing a SpanProcessor to a TraceProvider transfer ownership? Is it possible to opt out of that transfer if need be, such that shutting down a TraceProvider would not shut down a particular SpanProcessor it had been using?

Mitigating that, I can imagine some wrapper/decorator types that swallow such a call to Shutdown, as used to be common in Java libraries. Go's ioutil.NopCloser also comes to mind.

@Oberon00
Copy link
Member

The TracerProvider in the SDK can provide a broader API than the one in the SDK (which may be implemented as a derived interface). I think Java did this rather elegantly: The (more or less) private TracerProviderSdk class implements both an TracerSdkManagement interface (which has AddSpanProcessor, etc) and the API's TracerProvider interface which has just GetTracer.

@XSAM
Copy link
Member Author

XSAM commented Oct 13, 2020

I agree the TracerProvider API should not have that Shutdown function since it is not related to the instrumentation devs, only to end-users.

Therefore, the *Provider API remains unchanged, and the *Provider is the SDK MUST provide Shutdown function.

Since initial functions like NewTracerProvider return the SDK, end-users can still benefit from these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue priority:p1 Highest priority level release:required-for-ga Must be resolved before GA release, or nice to have before GA spec:metrics Related to the specification/metrics directory spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants