Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Deprecate or remove ScopeManager.active() #267

Open
codefromthecrypt opened this issue Apr 11, 2018 · 38 comments
Open

Deprecate or remove ScopeManager.active() #267

codefromthecrypt opened this issue Apr 11, 2018 · 38 comments
Labels

Comments

@codefromthecrypt
Copy link
Contributor

ScopeManager.active is documented as a feature used to get the current span. Perhaps it is a relic from the previous design. It tempts people to call close on it eventhough ScopeManager is pretty clear that scopes should be managed directly.

I would remove this api, possibly for ScopeManager.activeSpan if that's the intended use case to serve. Otherwise, please clarify the documentation that calls to close are undefined.

If folks are doing things like subclassing, you can deprecate the api, and they can still offer "advanced" things as they exist today as their subtype would still be able to offer that functionality regardless of it it exists on a supertype.

openzipkin-contrib/brave-opentracing#74 (comment)

@carlosalberto
Copy link
Collaborator

carlosalberto commented Apr 17, 2018

Hey @adriancole thanks for the feedback.

I think that's a very valid point, and the important question here is whether there is a pattern that requires the Scope to be available in ScopeManager (or a pattern that cannot keep the Scope around it close it itself).

Will think some though on it and try to cook some examples (both a) the need for the current design, if any, and b) the problem of exposing it in ScopeManager)

@yurishkuro @pavolloffay @tylerbenson may have some thoughts on it too ;)

@cwe1ss
Copy link
Member

cwe1ss commented Apr 17, 2018

I need ScopeManager.active in C# to access the current Scope in scenarios where the library/framework I'd like to instrument only has OnStart/OnStop hooks without any chance to store the scope somewhere.

E.g. .NET has its own OpenTracing "span"-like construct called Activity and I'm translating them to "real" spans with this code: https://github.com/opentracing-contrib/csharp-netcore/blob/master/src/OpenTracing.Contrib.NetCore/Internal/GenericEventProcessor.cs#L43-L67

If I only had access to Tracer.activeSpan I could not close the scope.

@carlosalberto
Copy link
Collaborator

Hey @cwe1ss thanks for the feedback, definitely appreaciated ;)

Sounds like we should cook such example for the testbed to show that we would need ScopeManager.active() in such scenario, to test things out.

please clarify the documentation that calls to close are undefined.

If we decide to keep this API, then we need to stress this point, definitely, so we avoid unintended problems for the users ;)

@tylerbenson
Copy link

I'm probably missing something, but I don't understand the problem with that method... Why wouldn't you want a way to get the current scope from the ScopeManager? How would you get it otherwise except from where it was originally created? In fact, we use it in a fair bit of our instrumentation.

When writing instrumentation, it is often useful to know if there is a current active span. We also have additional logic on our scope that wouldn't be as nice to have on the span. I suggest leaving it as is.

@pavolloffay
Copy link
Member

I understand @adriancole concern's: Scopes were designed to be explicitly passed around...

The current API is useful when there are no "attributes" where the Scope could be stored.

brave-opentracing does noop on tracer.active().close() which means some instrumentations do not work properly with brave. I would like to understand why it's noop, is there something specific?

I have submitted fix for it openzipkin-contrib/brave-opentracing#83 (if there is an agreement to fix it)

@codefromthecrypt
Copy link
Contributor Author

I completely disagree especially in Java about having to retain a lookup table for scope references. this adds debt which turns to permanent debt once you go v1.

can you please show any prior art which does this and why it should be done here? for example knowing there is a current span is different than having a reference to close a scope (especially from another thread!)

can you please engage some language experts if you desire to do this anyway?

@tylerbenson
Copy link

After discussing with Adrian in a side channel, I am ok with his proposal to have the return type change.

I still think there should be a way to tell if there is an active trace currently happening, so if that is a method that returns the span or the span context and that result being null means no trace, then I guess that will work.

@felixbarny
Copy link
Contributor

My problem with the ScopeManager.active() active API is that it implies that the same instance is returned as when the Scope has been created and that the finishSpanOnClose flag is remembered.

This makes it impossible to integrate it into the internal "scope manager" of a tracer, which is agnostic of OpenTracing. The only way to make this work in a bridge is to have a separate scope manager for the bridge. But then you can't get the active span when the span was created using the internal API and not the bridge.

In it's current form it's impossible to write a correctly working bridge for tracer.active().close(). So for the Elastic APM OT bridge, this is a noop.

So my suggestion would be to remove the finishSpanOnClose flag or to deprecate ScopeManager.active in favor of ScopeManager.activeSpan.

@codefromthecrypt
Copy link
Contributor Author

folks @pavolloffay seems reluctant to fix the instrumentation he made and wants to force brave to implement this dodgy api instead. Can we please get this removed?

@codefromthecrypt
Copy link
Contributor Author

@CodingFabian @raphw are either of you in love with this api that requires implementations to track and try to return scope instances? If you are can you please comment why? If not can you please help get this removed to de-burden implementations?

It is worth mentioning that thinking about this problem from a library pov is important. Agents can fix all sorts of misbehavior, but libaries cannot. So, for example, I'm aware that folks using agents might "shrug off" less than ideal apis as they can wrangle them in mysterious ways. Think about this problem in a way where you can't just fix it via bytecode. I think you'll agree we are better off without it.

@pavolloffay
Copy link
Member

folks @pavolloffay seems reluctant to fix the instrumentation he made and wants to force brave to implement this dodgy api instead. Can we please get this removed?

I am not sure where in the world you found that I am reluctant to fix this issue in instrumentations opentracing-contrib/java-jaxrs#88. What is sure that you are reluctant to fix this in brave openzipkin-contrib/brave-opentracing#83 (comment)

@codefromthecrypt
Copy link
Contributor Author

@pavolloffay your comment was that existing instrumentation are broken. you wrote the instrumentation leading to that issue. fixing the version <1 instrumentation or fixing the <1 api both sort the problem in ways more sustainable than cementing in the malpractice that exists in the tracer. stop pushing design problems on other repos

@CodingFabian
Copy link

Well there is this spec proposal to give access to span id for example, so implementations need to provide a way to access the instance anyway.
We currently have no problem with this API.

@carlosalberto
Copy link
Collaborator

Hey everybody,

So, trying to move on with this topic, after reading the entire thread again, these are my impressions:

  • Having a Scope instance around can definitely be helpful, for operations other than the usual deactivation/finish of the contained Span (as Tyler had mentioned initially, DataDog's Scope implementation has extra members/operations). Another example is the initial Akka instrumentation, where the Scope also has extra members for keeping track of fire-and-forget operations count (https://github.com/opentracing-contrib/scala-concurrent/blob/master/src/main/scala/io/opentracing/contrib/concurrent/TracedAutoFinishExecutionContext.scala)

  • Having ScopeManager.active() around can be very helpful when instrumenting libraries that have no context or similar to store the Scope (as @cwe1ss pointed out already). The alternative for those cases is to keep a thread-safe, weak map to store the resulting Scopes (ironically, leading to the very case Adrian mentioned for Brave ;) ).

  • As Adrian pointed out, having ScopeManager.active().close() easily reachable can be dangerous, as users may indeed end up closing Scope instances they don't own. However, I think this could be handled by improving the documentation and making clear the consequences of bad usage here.

  • So, the main issue I really see here, is the case of writing bridges to other tracing systems, and make it work just fine with their internal logic (as both @felixbarny and @adriancole pointed out), as keeping the Scope object around on their side can be problematic. I wonder if there is a way to somehow fine a proper, correct, not so complicated way to face this.

I'd say the last point is the important one, which needs to be addressed first, in order to in go in either direction - keep ScopeManager.active() around or drop it.

cc @yurishkuro @tedsuo

@carlosalberto
Copy link
Collaborator

Hey everybody,

We would like to have a Cross-Language group call on July 13th (next week) and discuss this item as part of it. Please let me know if you guys can make it, specially @adriancole @felixbarny @cwe1ss

@codefromthecrypt
Copy link
Contributor Author

What was the outcome of your call?

ps on your point about "Having ScopeManager.active() around can be very helpful when instrumenting libraries that have no context or similar to store the Scope" The main concern is creating an api less likely to lead to problems. The call-sites which have no other means to pass a scope are the "infected" call sites, and take the risk without exposing it (risky stuff) by default to all call sites and consumers. In other words, yes call sites like these using thread locals or a utility like brave has, have a chance of leaking. Just the surface area is much smaller and you can easily tell where that problematic code is, because it isn't.. well everywhere.

@sjoerdtalsma
Copy link
Contributor

sjoerdtalsma commented Jul 21, 2018

The more I think about this, the more I tend to agree with Adrian.

Scope is basically a container for span() that can be close()-ed.
Access to the current scope (over the current span) is therefore only useful if you want to close() it or have some implementation specific extension attached to it.

In my humble opinion:

  1. The close from anywhere is problematic. It should be controlled by the one activating the scope.
  2. Implementation specific could be a sign that something is missing from the core API?
  3. Perhaps the extension should have been applied to Span instead of scope?
  4. If still going implementation specific; why not have a specific way to obtain active scope as well?

What would absolutely break if active scope was replaced by active span and couldn't be solved otherwise?

Background: I made an adapter between my 'general purpose' context propagation mechanism and the OpenTracing ScopeManager and it was non-trivial when to close and not close the scope.

@felixbarny
Copy link
Contributor

@adriancole no real conclusions. I addressed the problems with ScopeManager.active() again and also proposed to remove the finishSpanOnClose flag. I've just created an issue about that here: #291.

@felixbarny
Copy link
Contributor

Are there any use cases which require to get the Scope from the ScopeManager as opposed to the active Span? If so, could we collect some code samples for that?

@carlosalberto
Copy link
Collaborator

@felixbarny I think the case @cwe1ss pointed out some time ago still stands:

I need ScopeManager.active in C# to access the current Scope in scenarios where the library/framework I'd like to instrument only has OnStart/OnStop hooks without any chance to store the scope somewhere.
...
If I only had access to Tracer.activeSpan I could not close the scope.

https://github.com/opentracing-contrib/csharp-netcore/blob/master/src/OpenTracing.Contrib.NetCore/Internal/GenericEventProcessor.cs#L43-L67

@carlosalberto
Copy link
Collaborator

Hey @sjoerdtalsma

The close from anywhere is problematic. It should be controlled by the one activating the scope.

I think this could be easily 'enforced' by indeed not exposing Scope but Span in ScopeManager.

Perhaps the extension should have been applied to Span instead of scope?

So this is something that could be tried with some experimental API. We could need to think how to handle the separation between activation and lifetime in Span - maybe completely separated methods for finishing and deactivating ... In that regard, the opentracing-testbed section could be used to try out such new API.

@sjoerdtalsma
Copy link
Contributor

@carlosalberto

The close from anywhere is problematic. It should be controlled by the one activating the scope.

I think this could be easily 'enforced' by indeed not exposing Scope but Span in ScopeManager.

@adriancole Sounds like the point of your request? Remove active Scope in favour of an active Span ?

@codefromthecrypt
Copy link
Contributor Author

I'm not sure why there is a concept of removing active scope in favor of span.. why this is somehow experimental. This was in use in brave and also census for quite a long time. Retrieving the scope is the thing that causes issues. As mentioned in the initial description ScopeManager.activeSpan or similar is fine by me.

I don't know why there is cause to stall something like this with a testbed notion, but as it is already stalled for a while if you feel you need to test the idea of only returning active span, be my guest!

@carlosalberto
Copy link
Collaborator

Hey @adriancole

I'm not sure why there is a concept of removing active scope in favor of span.. why this is somehow experimental. This was in use in brave and also census for quite a long time. Retrieving the scope is the thing that causes issues

So, if I understood correctly, you would like ScopeManager.active() to go in favor of ScopeManager.activeSpan() and keep the notion of Scope (through startActive()) otherwise? That sounds fine (by me, at least ;) )

So the testbed notion is precisely for testing how API changes could impact different use cases. The change for active() -> activeSpan() is not that complex, so it wouldn't require as many iterations in that regard (and I can do a PR myself for that); but if we decide to try to change the Scope notion, well, that would require more effort on that front, for good or bad ;)

Nevertheless, I can definitely help with that as needed.

@codefromthecrypt
Copy link
Contributor Author

@carlosalberto so this is still an issue.. I can see you last explained yourself are you waiting for me or someone to say go?

I just recently had to do a lot of work to avoid holding up classloaders due to scenarios such as this issue. If you are planning to address this, can you go ahead and proceed? Thanks.

@carlosalberto
Copy link
Collaborator

Hey @adriancole

Sorry, had been busy doing other work ;) So the plan is to work tomorrow on updating the PR for #291 and have something ready for testing soon.

@jeacott1
Copy link

so far my experience with opentracing in Java has been somewhat less than stellar particularly around this api and in particular the decision to track scopes with threadlocals.

Due to heavy use of executor pools and async things I've found out of the box scope management causes many mem leaks in my case. So in some cases where I know I'm in the same thread but just want to hand over context without polluting my code, I've been trialing creating an activeSpan and then immediately closing in the callee but leaving the span open - using the threadlocal in managed short bursts. this however requires great care and is still likely to come unstuck, but at least the option exists. without scopeManager().active() I couldnt do that.

@yurishkuro
Copy link
Member

the decision to track scopes with threadlocals

Do you have a suggestion for a better alternative?

@jeacott1
Copy link

jeacott1 commented Sep 21, 2018

@yurishkuro I'm not sure tbh, and certainly don't have any good answer. I've tracked my own tracing contexts in a singleton trie model, using keys that make sense available in my application to find the active span or parent in any given circumstance. I think I might try to instrument the returned futures from the various executors and services and attempt to track the start and end of each future to maintain the tracing that way instead of code scattered everywhere. carrying the context across method calls etc if there is nothing useful to hook is very problematic though and so far the only options I can think of are all very invasive except for the threadlocal in short bursts to carry the context where I'm sure its not jumping threads.

@yurishkuro
Copy link
Member

Not sure I follow. A central cache (your trie) is of no help unless you explicitly propagate those IDs. The point of threadlocals is that you don't pass anything between functions yet preserve the context. Take spring boot - by adding a few jars to the classpath you can get your application traced with zero code changes. I'm not aware of any technique that can do that without threadlocals or relying on some bespoke, already explicitly propagated IDs.

@jeacott1
Copy link

jeacott1 commented Sep 23, 2018

I'm using spring boot, but adding a few jars to my context does not get my application traced. it gets it broken with metaspace memory leaks, and closing spans too soon. the issue is that there are executors starting services and the implementations of tracingexecutorservices assume that when the Runnable/Callable ends is when the tracing ends which just isnt always the case. the executing task might hand off to other threads several times, and invoke many other async tasks before it should terminate a span. They need to track the returned Future (if submit is called) and be very careful about the likely reuse of the same thread in a different context.
so yes - I have to manage the propagation myself except in some cases where I am sure its safe and I momentarily use the threadlocal - mostly just so I don't have to create new implementations of all the existing things that look for a threadlocal context.

@codefromthecrypt
Copy link
Contributor Author

As far as I can tell, the recent discussion is different than the "let's remove scope tracking from scope manager".

Basically, @jeacott1 discusses the usual concerns about tracking futures or async work, that some work spans multiple runnables etc. This is important to understand, and why for example in zipkin we started an advanced instrumentation wiki. OTOH, it seems to me if anything an argument to remove the scope tracking requirement as it still doesn't serve the use case discussed!

IOTW, can we please proceed in removing scope tracking as it has known problems, complicates scope manager and introduces error cases such as stacking and/or what happens when it is redundantly executed etc.

@codefromthecrypt
Copy link
Contributor Author

If people are looking for alternatives, composition is the better answer as call sites where you need to share a thread scope are possible to coordinate within the framework that is scheduling work. For example, a ThreadLocal can be used to stash a reference between callbacks, which is more explicit and also doesn't move the responsibility to something that might not even use thread locals (scope manager!). In brave, we introduced a utility ThreadLocalSpan which eliminated most cases where you need to do repetitive work https://github.com/openzipkin/brave/blob/3be55b5cccf881104bdd80c93e97d2575b83952d/brave/README.md#working-with-callbacks

That isn't to say special utilities for stream composition or executors don't need other approaches, it is to say that composition, especially bound to the utility in question, is the better answer vs forcing this concern on all scope implementations, especially knowing how many problems this causes.

TLDR please let's delete this!

@jeacott1
Copy link

jeacott1 commented Jan 3, 2019

fwiw anything that leans on the fork-join pool will be broken by any method that's trying to use threadlocals, and more and more I find myself using Quasar or Kotlin coroutines neither of which work out of the box (or at all) with stock threadlocals.
an external store for me was a viable solution because I was already propagating a context object and it was fairly painless to use existing id's as span reference, from where I could reliably hop threads and manage the closing of same in callbacks.
I ended up with something like this - invasive, not pretty, but functional.
Jaeger worked fine with this, but the Datadog Agent still leaked even with this. - they have different approaches to managing things which I guess highlights the fragility of the notion.

So in the end I'm not using the ScopeManager.active() at all as it just wasn't necessary or useful
given I already had a reference.
I'd be all for abandoning trying to use threadlocals too.

//start a span.
String spanKey = eContext.getId(); //could be anything that makes sense.
Span span = tracingSpanManager.createSpan("spanName", Tags.SPAN_KIND_CLIENT, spanKey);
...
in a callback
Futures.addCallback(futureResponse, new FutureCallback<String>() {
				@Override
				public void onFailure(Throwable e) {
              Span activeSpan = tracingSpanManager.get(eContext.getId());
              Scope scope = tracingSpanManager.getTracer().scopeManager().activate(activeSpan, false);
              scope.close();
...
///create a child span
	String childId = tracingSpanManager.putChild(eContext.getId(), "seq", Tags.SPAN_KIND_CLIENT);
			Callable<Data> callable = new Callable<Data>(){

				@Override
				public Data call() throws Exception {
					final Span activeSpan = tracingSpanManager.get(childId);
					Scope scope = tracingSpanManager.getTracer().scopeManager().activate(activeSpan, false);
...
}
...
	Futures.addCallback(callableFuture, new FutureCallback<Data>() {

				@Override
				public void onSuccess(Data data) {
                                       //finish the span.
					tracingSpanManager.finish(childId);
                                   }
...

}

@yurishkuro
Copy link
Member

I am convinced by the arguments that scopeManager.active() should go away (starting with deprecating it), especially @sjoerdtalsma #267 (comment) that nails it imo. Would be interesting to see if anything at all in the testbed would break if we removed it - most code should just work by switching to scopeManager.activeSpan().

I am not sure about @cwe1ss 's #267 (comment), as I am not that familiar with C#. The code example he provided certainly looks like an unintended API usage: creating a scope and throwing it away, then relying on active() to retrieve it. The Activity looks very much like a Span, with its own unique IDs, so it feels odd that spans are created in parallel with that. In any case, we don't need to block a decision about Java API based on the needs of C# API.

Also agree with @adriancole that @jeacott1 's issue is unrelated to this ticket - if you propagate context manually/explicitly, you have no need for "active span". Creating a scope via activate() and immediately closing it is effectively no-op (+ gc noise), unless you're using some funky scope manager impl.

I +1-ed the initial description.

@jeacott1
Copy link

jeacott1 commented Jan 3, 2019

@yurishkuro "Creating a scope via activate() and immediately closing it is effectively no-op" - agreed, except that in the current setup if you dont do that you arent closing the right thing. close only closes the right thing if its also active - somewhere in the docs it says that and it tripped me up initially.

@carlosalberto
Copy link
Collaborator

Hey all,

Would be interesting to see if anything at all in the testbed would break if we removed it

I think we forgot to add a test case to showcase Christian's case (comment). But the rest of the testbed section will run just fine after removing active(), I think.

Given the feedback, I'm wondering how often the aforementioned case shows up in real life (this case has been, historically, the reason to keep active() around). But if there's enough agreement, even with this specific (C#) scenario, we could definitely consider removing it.

Opinion on this @opentracing/opentracing-java-maintainers @CodingFabian @tylerbenson ?

@tylerbenson
Copy link

I am ok deprecating->removing ScopeManager.active(). It significantly increases the risk of messing up the scope stack by allowing scopes to be closed where it shouldn't. In cases where the scope needs to be accessible between two different method calls (OnStart/OnStop), it might be best to use a separate thread local for storing the scope.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

10 participants