Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlobalOpenTelemetry trigger of autoconfiguration is opt-in #5010

Merged

Conversation

jack-berg
Copy link
Member

@jack-berg jack-berg commented Dec 1, 2022

GlobalOpenTelemetry.get has a side affect where if GlobalOpenTelemetry.set has not been called and opentelemetry-sdk-extension-autoconfigure is on the classpath, the OpenTelemetrySdk will automatically be configured and set via GlobalOpenTelemetry.set.

This is problematic for the reasons described in this comment:

The more generalized problem is how do SDK extension components (exporters, processors, etc) get access to OpenTelemetry so they can instrument themselves? GlobalOpenTelemetry has been our answer to tricky ordering and access situations, and seems to be the right tool for the job here. The fact that it has a side affect where GlobalOpenTelemetry#get wants to initialize AutoConfiguredOpenTelemetrySdk really throws a wrench into the scenario where someone wants to use AutoConfiguredOpenTelemetrySdkBuilder.setResultAsGlobal(false).

Essentially there is no way to include opentelemetry-sdk-extension-autoconfigure and have AutoConfiguredOpenTelemetrySdkBuilder.setResultAsGlobal(false) since any code that calls GlobalOpenTelemetry.get will trigger autoconfiguration undermine the attempt to not set GlobalOpenTelemetry.

Then there's the question of who / what instrumentation wants this side affect? As @mateuszrzeszutek says:

We're discouraging everybody from using the GlobalOpenTelemetry in manually instrumented applications anyway, so why do we want it to be "smart" and auto-load the SDK? You don't need the global instance auto-configuring the SDK when using the agent, and when you're instrumenting the app manually you shouldn't be using the global in the first place.

This PR requires users to opt-in to the side affect by setting otel.java.global-autoconfigure.enabled=true.

@codecov
Copy link

codecov bot commented Dec 1, 2022

Codecov Report

Base: 91.27% // Head: 91.27% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (1a2a776) compared to base (f490ba8).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #5010   +/-   ##
=========================================
  Coverage     91.27%   91.27%           
- Complexity     4885     4892    +7     
=========================================
  Files           552      552           
  Lines         14433    14454   +21     
  Branches       1373     1375    +2     
=========================================
+ Hits          13173    13193   +20     
  Misses          874      874           
- Partials        386      387    +1     
Impacted Files Coverage Δ
...emetry/sdk/autoconfigure/spi/ConfigProperties.java 100.00% <ø> (ø)
...java/io/opentelemetry/api/GlobalOpenTelemetry.java 93.33% <100.00%> (+0.74%) ⬆️
...java/io/opentelemetry/api/internal/ConfigUtil.java 100.00% <100.00%> (ø)
...onfigure/spi/internal/DefaultConfigProperties.java 94.11% <100.00%> (ø)
...metry/sdk/logs/export/BatchLogRecordProcessor.java 89.36% <0.00%> (-0.71%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jkwatson
Copy link
Contributor

jkwatson commented Dec 1, 2022

I think this is a good change. How can we best ensure that this won't be a badly breaking change for users, or at least that they will be able to proactively adjust for it?

@jack-berg jack-berg force-pushed the global-opentelemetry-autoconfigure branch from 2051d96 to 1a2a776 Compare December 1, 2022 21:28
@jack-berg
Copy link
Member Author

I've added a commit that allows GlobalOpenTelemetry.get to trigger autoconfiguration if the env var / system property otel.java.global-autoconfigure.enabled=true. "Global autoconfiguration" defaults to false.

@jack-berg jack-berg changed the title Do not initialize AutoConfiguredOpenTelemetrySdk in OpenTelemetry.get GlobalOpenTelemetry trigger of autoconfiguration is opt-in Dec 1, 2022
@jack-berg jack-berg marked this pull request as ready for review December 1, 2022 21:31
@jack-berg jack-berg requested a review from a team December 1, 2022 21:31
+ " To enable, run your JVM with -D"
+ GLOBAL_AUTOCONFIGURE_ENABLED_PROPERTY
+ "=true");
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is null the right response here, rather than the no-op implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the caller of this...I think we might be doing the wrong thing right now. If this method returns null, we're calling set with the no-op impl. That will mean (I think?) that we will always end up returning the no-op, because of the check in the set method. I think this is wrong because we do want people to be able to set after calling a uninitialized get.

Copy link
Member Author

@jack-berg jack-berg Dec 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will mean (I think?) that we will always end up returning the no-op, because of the check in the set method.

That is correct. Here's a summary of the behavior as of right now on main:

  • If GlobalOpenTelemetry.get() is called before GlobalOpenTelemetry.set(), an OpenTelemetry instance is resolved and set using GlobalOpenTelemetry.set().
  • If autoconfigure is on the classpath and autoconfigure initializes without error, GlobalOpenTelemetry will be set to an autoconfigured instance of OpenTelemetrySdk.
  • If autoconfigure is not on the classpath, or some error occurs during initialization, GlobalOpenTelemetry will be set to OpenTelemetry.noop().

I think this is wrong because we do want people to be able to set after calling a uninitialized get.

I'm not so sure. The current behavior is inconvenient because it forces users to either call GlobalOpenTelemetry.set OR have autoconfiguration do so before any call to GlobalOpenTelemetry.get. Trying to call GlobalOpenTelemetry.set later triggers an exception which can only be resolved by adjusting app initialization ordering. But the benefit of this is that GlobalOpenTelemetry.get always returns the same value, regardless of the order in which its called. I.e. you can't have one caller receive a OpenTelemetry.noop() and another caller receive an autoconfigured OpenTelemetrySdk. This is a nice invariant to be able to rely on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I can see that side of things as well (wanting in invariant). I guess it boils down to: in the case of incorrect initialization ordering, do you want a) nothing to be emitted from the SDK at all or b) at least some things emitted, from the instruments/tracers/etc that are initialized after the SDK initialization has taken place.

I don't know that there's an easy answer to this one. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup it's a good question and is kind of a design philosophy question between fail fast or allow partial success.

I'm prefer to the fail fast because I think the absence of all telemetry provides a really strong signal to go and fix the issue.

If we do want to change this behavior should probably do it in a different PR to separate concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'll be able to make sure no one will call get before I create the OpenTelemetry object.
I will need to doublecheck this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'll be able to make sure no one will call get before I create the OpenTelemetry object.

You don't need to. If you rely on GlobalOpenTelemetry.get() calling AutoConfiguredOpenTelemetrySdk.initialize(), you will just need to opt in to that behavior with an environment variable or system property.

The current behavior should still be possible, but it shouldn't come at the expense of other scenarios that don't want the side affect. It makes sense to the AutoConfiguredOpenTelemetrySdk.initialize() side affect opt-in rather than opt-out because autoconfigure users should be comfortable adding environment variables / system properties so it shouldn't be a problem be a big lift to add one more.

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much support this change, I think it makes using GlobalOpenTelemetry less scary by removing this side-effect, which is important for Java agent users.

I'm not sure it's the best solution to this problem:

The more generalized problem is how do SDK extension components (exporters, processors, etc) get access to OpenTelemetry so they can instrument themselves?

because ideally the SDK extension components (e.g. a SpanExporter) would get access to the specific OpenTelemetry instance which they were used to create, instead of the GlobalOpenTelemetry instance which might be a different OpenTelemetry instance (e.g. configured to a different endpoint)

@jack-berg
Copy link
Member Author

because ideally the SDK extension components (e.g. a SpanExporter) would get access to the specific OpenTelemetry instance which they were used to create, instead of the GlobalOpenTelemetry instance which might be a different OpenTelemetry instance (e.g. configured to a different endpoint)

@trask opened #5021 to propose a solution to this. I think both this PR and a solution like #5021 are needed.

@deejgregor
Copy link
Contributor

deejgregor commented Dec 7, 2022

GlobalOpenTelemetry and SDK initialization

Below is what I think is an exhaustive list of ways the SDK can be initialized and set in GlobalOpenTelemetry (or not).

Today, all of these initialization methods exist with the exception of option 3 (which is new with this PR). This PR also modifies the existing option 4.

  1. Auto-configuration with Java agent -- with the default of otel.javaagent.enabled=true for the Java agent, GlobalOpenTelemetry will be set (see this). If otel.javaagent.enabled=false, then we fall through to one of the later options. If the OpenTelemetry Kubernetes Operator is used with instrumentation.opentelemetry.io/inject-java: "true" for auto-instrumentation injection, this is the option that will be used.
  2. GlobalOpenTelemetry.get() without AutoConfiguredOpenTelemetrySdk in the classpath -- noop SDK (see this and this). Note: this will also happen if AutoConfiguredOpenTelemetrySdk gets an InvocationTargetException while initializing.
  3. NEW: GlobalOpenTelemetry.get() with AutoConfiguredOpenTelemetrySdk in the classpath but without otel.java.global-autoconfigure.enabled=true -- noop SDK
  4. CHANGE: GlobalOpenTelemetry.get() with AutoConfiguredOpenTelemetrySdk in the classpath (today). In the future this will require otel.java.global-autoconfigure.enabled=true) -- Auto-configured SDK
  5. Manual configuration of SDK and calling GlobalOpenTelemetry.set(directly or indirectly) -- code it all yourself for run and profit, without or without AutoConfiguredOpenTelemetrySdk but only before any calls with get(). Note: the GlobalOpenTelemetry.set can happen indirectly by doing things like calling AutoConfiguredOpenTelemetrySdk.builder().build() (as long as setResultAsGlobal(false) is not used).
  6. Manual configuration of GlobalOpenTelemetry with with noop SDK -- possible today by calling GlobalOpenTelemetry.set(OpenTelemetry.noop()) but only before any calls with get().
  7. Initialization of the SDK without calling GlobalOpenTelemetry.set -- this can happen by explicitly calling auto-configuration with AutoConfiguredOpenTelemetrySdk.builder().setResultAsGlobal(false).build() or configuring the SDK manually. -- only possible today if GlobalOpenTelemetry.get() is never called.
  8. No initialization of SDK or GlobalOpenTelemetry -- only possible today if GlobalOpenTelemetry.get() is never called.

Note: for all of the auto-configuration options listed above, an SDK will always be created and set in GlobalOpenTelemetry. When SDK auto-configuration is disabled with otel.sdk.disabled=true a minimal SDK will be used (not a noop SDK).

In the case where the OpenTelemetry Kubernetes Operator is used with instrumentation.opentelemetry.io/inject-sdk: "true" for auto-instrumentation injection of OpenTelemetry SDK environment variables only, any of the options above that use AutoConfiguredOpenTelemetrySdk could apply.

@jack-berg
Copy link
Member Author

@deejgregor its all so simple! 😅

Note: for all of the auto-configuration options listed above, an SDK will always be created and set in GlobalOpenTelemetry.

Just one clarification that you can use autoconfigure without GlobalOpenTelemetry by calling AutoConfiguredOpenTelemetrySdk.builder().setResultAsGlobal(false).build().

@deejgregor
Copy link
Contributor

Just one clarification that you can use autoconfigure without GlobalOpenTelemetry by calling AutoConfiguredOpenTelemetrySdk.builder().setResultAsGlobal(false).build().

Thanks. I've added it as 7. I was struggling with how to include this when I made the list last night (and this is important, since it is the case that was referenced in the creation of this issue!). I also tweaked 5 for the case with setResultAsGlobal(true).

I also added notes on the Kubernetes Operator and which cases I think apply.

Next up: the real reason why I made this list: to summarize impacts and pros/cons for a few use cases (and why I overall l am 👍 this change).

@deejgregor
Copy link
Contributor

I am 👍 with this change and having GlobalOpenTelemetry.get() be invariant, particularly to help support more library use of GlobalOpenTelemetry, and slightly cleaner use in OSGi.

In #5010 (comment) I tried to enumerate the cases where the SDK is initialized and/or GlobalOpenTelemetry.set happens. The main impacts from this change are:

  1. NEW: GlobalOpenTelemetry.get() with AutoConfiguredOpenTelemetrySdk in the classpath but without otel.java.global-autoconfigure.enabled=true -- noop SDK
  2. CHANGE: GlobalOpenTelemetry.get() with AutoConfiguredOpenTelemetrySdk in the classpath (today). In the future this will require otel.java.global-autoconfigure.enabled=true) -- Auto-configured SDK

I believe that's it. Only people initializing via a call to GlobalOpenTelemetry.get() with AutoConfiguredOpenTelemetrySdk in the classpath will be impacted. This should be highlighted as a breaking change in the CHANGELOG, I think, since they will end up with a noop SDK after this change. Note: This could also include people using the OpenTelemetry Kubernetes Operator without the agent (instrumentation.opentelemetry.io/inject-sdk: "true" or with instrumentation.opentelemetry.io/inject-java: "true" and the agent disabled with otel.javaagent.enabled=false) with case 4 above (which is probably not many).

The users that I don't believe would be impacted at all are those using the Java agent (as long as it is enabled; case 1), those that don't have AutoConfiguredOpenTelemetrySdk in their class path (case 2), or those who manually initialize the SDK (or manually invoke AutoConfiguredOpenTelemetrySdk) and call GlobalOpenTelemetry.set (cases 5 and 6) and those who manually initialize the SDK or don't initialize the SDK at all and don't use GlobalOpenTelemetry.set (cases 7 and 8).

To continue with @trask's comment:

I very much support this change, I think it makes using GlobalOpenTelemetry less scary by removing this side-effect, which is important for Java agent users.

Personally, I'd like to see more libraries support OpenTelemetry without requiring explicit OpenTelemetry injection and GlobalOpenTelemetry seems like it would be the way to do that. I agree that this change makes using GlobalOpenTelemetry less scary, both for the Java agent use case and with libraries (even with the caveat for library use mentioned at the end of https://opentelemetry.io/docs/instrumentation/java/manual/#setup). This particularly affects users in cases 7 and 8.

Now, one effect of more libraries using GlobalOpenTelemetry.get over time, is that they might call get before the application does manual initializing of the SDK. I think this reinforces the goal behind this PR and having "Global autoconfiguration" default to false. Also, I think this supports the invariant method and fast-failure discussed around #5010 (comment) -- this way, when you upgrade to a newer library that adds OpenTelemetry support but things are now initialized out of order, you get a more clear failure of no telemetry at all. This particularly affects users in cases 5 and 6.

Lastly, for OSGi, we want to avoid any uses of Class.forName (see #4627 (comment)), and with global autoconfiguration defaulting to false, it will prevent the sole case of calling Class.forName form happening in the API.

@jack-berg
Copy link
Member Author

@jkwatson pending any more comments from you, I'd like to merge this after another day or two. There's quite a bit of work which is dependent on this.

Copy link
Contributor

@jkwatson jkwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jack-berg jack-berg merged commit 7c6f1bd into open-telemetry:main Dec 14, 2022
@jack-berg jack-berg mentioned this pull request Dec 14, 2022
dmarkwat pushed a commit to dmarkwat/opentelemetry-java that referenced this pull request Dec 30, 2022
…metry#5010)

* Do not initialize AutoConfiguredOpenTelemetrySdk in OpenTelemetry.get

* GlobalOpenTelemetry triggers autoconfigure based on env var / system property
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants