Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[configuration] Add exception for 'false default' when there is a double negation #4351

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mx-psi
Copy link
Member

@mx-psi mx-psi commented Jan 8, 2025

Relates to #4344.

Changes

Adds exception to naming and default value rule for environment variables.

This is a backwards compatible change since the original statement is phrased as a 'SHOULD'. The OTEL_SDK_DISABLED deviates from this new norm (this is not a spec violation but a lack of consistency).

The wording tries to be similar to the one in footnote [1] here: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.40.0/specification/protocol/exporter.md

There are several OpenTelemetry artifacts that are either not bound by the specification or have decided, for usability reasons, to not follow this rule in this case. This change would, I argue, bring more consistency and less confusion to end users.

I plan to submit a follow-up PR to change MeterConfig.disabled to MeterConfig.enabled if this PR is accepted.
I don't plan to submit a PR to migrate OTEL_SDK_DISABLED, but this could be done if there is interest.

Thanks to @pellared for guidance on this :)

For non-trivial changes, follow the change proposal process.

CHANGELOG.md Outdated Show resolved Hide resolved
Co-authored-by: Robert Pająk <pellared@hotmail.com>
Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support but would recommend that it be left open through at least the Tue, Jan 21 spec meeting given the contentious discussion around the topic previously.

Copy link
Contributor

@MrAlias MrAlias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is introducing ambiguity and inconsistency to a clear and well litigated part of the specification. It is only substantiated with a biased subjective opinion about an undefined user-base.

The prior decision for this part of the specification was made on logical reasoning. There are neither objective data nor facts being presented here that add to the prior discussion. Do these exist? Can you provide them to motivate this change?

The added ambiguity and inconsistency deteriorate the specification and should not be accepted without better motivation.

@mx-psi mx-psi marked this pull request as ready for review January 9, 2025 09:11
@mx-psi mx-psi requested review from a team as code owners January 9, 2025 09:11
@mx-psi
Copy link
Member Author

mx-psi commented Jan 9, 2025

with a biased subjective opinion

Could you clarify what is biased about my or others opinion here?

about an undefined user-base.

I think I have been very clear about the user base that motivated this decision: Collector users. The discussion that followed also showed other artifacts within OpenTelemetry that do not follow this approach. What is undefined about this?

There are neither objective data nor facts being presented here that add to the prior discussion.

#4344 (comment), #4344 (comment) and #4344 (comment) are objective data showing how this convention is not followed elsewhere in OpenTelemetry.

I am happy to dig up more, but before I spend any further effort on this, I would like to step back and clarify what evidence would you consider as 'objective data'. What kind of evidence would make you change your mind?

The added ambiguity and inconsistency deteriorate the specification

The sentence this PR modifies in the spec is inconsistent with several software artifacts end users use that are under the OpenTelemetry umbrella. I disagree that this adds any inconsistency. Furthermore, I think the sentence has around the same level of ambiguity as the existing text. Could you give examples where this is more ambiguous than before?

and should not be accepted without better motivation.

Again, I think we need to step back and think: what motivation would we need? I don't want to spend effort on this if there is no motivation that would justify this change for you.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 9, 2025

Could you clarify what is biased about my or others opinion here?

You represent a specific demographic with inherent bias.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 9, 2025

about an undefined user-base.

I think I have been very clear about the user base that motivated this decision: Collector users. The discussion that followed also showed other artifacts within OpenTelemetry that do not follow this approach. What is undefined about this?

This is too narrow of a scope to claim it is representative of OpenTelemetry users. What about all the other languages being used, do you have data representing their expectations?

@MrAlias
Copy link
Contributor

MrAlias commented Jan 9, 2025

There are neither objective data nor facts being presented here that add to the prior discussion.

#4344 (comment), #4344 (comment) and #4344 (comment) are objective data showing how this convention is not followed elsewhere in OpenTelemetry.

These are all instances where different choices were made. These choice, again, are framed with biased subjective opinions. They do not address any reasoning from #2755 or add anything new to that discussion.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 9, 2025

I am happy to dig up more, but before I spend any further effort on this, I would like to step back and clarify what evidence would you consider as 'objective data'. What kind of evidence would make you change your mind?

I would appreciate that. I'm looking for user feedback, not OTel developer opinions, that represent this view. I would like the data to show not only those that are motivated to complain in an issue about the current state of things, but also shows how many users appreciate the current form of our environment variables.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 9, 2025

I am happy to dig up more, but before I spend any further effort on this, I would like to step back and clarify what evidence would you consider as 'objective data'. What kind of evidence would make you change your mind?

I would appreciate that. I'm looking for user feedback, not OTel developer opinions, that represent this view. I would like the data to show not only those that are motivated to complain in an issue about the current state of things, but also shows how many users appreciate the current form of our environment variables.

An OTel user-survey would help in this regard.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 9, 2025

The added ambiguity and inconsistency deteriorate the specification

The sentence this PR modifies in the spec is inconsistent with several software artifacts end users use that are under the OpenTelemetry umbrella. I disagree that this adds any inconsistency. Furthermore, I think the sentence has around the same level of ambiguity as the existing text. Could you give examples where this is more ambiguous than before?

Adding an exception that directly contradicts and existing recommendation is ambiguous. It leaves the reader with an unclear understanding about what they need to do. Especially since we already have prior art where the non-normative "exception" has been ignored.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 9, 2025

Again, I think we need to step back and think: what motivation would we need? I don't want to spend effort on this if there is no motivation that would justify this change for you.

I appreciate that fact. However, I would ask that it also be understood that much effort has already gone into the decision that resulted in our current state. This will reverse that decision. It seems reasonable that we require some level of effort to provide objective facts and data to motivate the change. Without such we will not be making an effective decision and likely just find ourselves back here in a few months.

@yurishkuro
Copy link
Member

I tend to agree with @MrAlias here - the existing wording is very clear and simple, the added caveat makes it worse. The "double negation" argument tries to fix poor data modeling / naming by introducing a caveat to the spec that breaks the fundamental principle - unset values are defaulted to "zero" value, which for Booleans is false. I would rather fix the naming. I agree that x_disabled=false is harder to grok, but the solution is to change the name, not the default treatment, e.g. if x is some action (like "sampling" or "reload") then replace it with skip_x, which has no double negative readability issues.

@austinlparker
Copy link
Member

I think @yurishkuro brings up a good point -- the existing guidance is clear and definitive, and loosening this requirement adds a lot of potential confusion. Is there some motivating examples for why we couldn't change names in order to allow for false defaults?

@trask
Copy link
Member

trask commented Jan 9, 2025

Is there some motivating examples for why we couldn't change names in order to allow for false defaults?

from the Java agent perspective (#4344 (comment)):

just sharing a data point, I suspect we won't change OTEL_INSTRUMENTATION_[NAME]_ENABLED in the Java agent, since some of the instrumentations are enabled by default and some are disabled by default, and so it would feel weird (and inconsistent in a different way) to have some specific instrumentations using OTEL_INSTRUMENTATION_[NAME]_ENABLED and other specific instrumentations using OTEL_INSTRUMENTATION_[NAME]_DISABLED

(and an additional wrinkle is that we have occasionally changed the default for specific instrumentations from enabled to disabled in major version bumps)

@austinlparker
Copy link
Member

Is there some motivating examples for why we couldn't change names in order to allow for false defaults?

from the Java agent perspective (#4344 (comment)):

just sharing a data point, I suspect we won't change OTEL_INSTRUMENTATION_[NAME]_ENABLED in the Java agent, since some of the instrumentations are enabled by default and some are disabled by default, and so it would feel weird (and inconsistent in a different way) to have some specific instrumentations using OTEL_INSTRUMENTATION_[NAME]_ENABLED and other specific instrumentations using OTEL_INSTRUMENTATION_[NAME]_DISABLED
(and an additional wrinkle is that we have occasionally changed the default for specific instrumentations from enabled to disabled in major version bumps)

Yeah, that is kinda annoying... is this something we can ensure that's normalized in the new config file stuff and just preserve the existing implementations?

@trask
Copy link
Member

trask commented Jan 9, 2025

Yeah, that is kinda annoying... is this something we can ensure that's normalized in the new config file stuff and just preserve the existing implementations?

I think that's the goal of this PR, but maybe it's not expressed well (or in the right place in the spec?), would appreciate any concrete suggestions on how to achieve this goal in specification language @MrAlias @yurishkuro

@yurishkuro
Copy link
Member

if we're talking specifically about the situation where multiple instrumentations can be enabled by user while at the same time some are automatically enabled, a good approach that is often used in linters is to have a named "preset" (rather than implicit defaults) that controls which parts are enabled by default via that preset. In other words, the enabled defaults (which contradict the existing guidance for default being false) are not enabled because individual settings are defaulted to true, but because another umbrella mechanism is applied, and then individual settings can override it.

@trask
Copy link
Member

trask commented Jan 9, 2025

if we're talking specifically about the situation where multiple instrumentations can be enabled by user while at the same time some are automatically enabled, a good approach that is often used in linters is to have a named "preset" (rather than implicit defaults) that controls which parts are enabled by default via that preset. In other words, the enabled defaults (which contradict the existing guidance for default being false) are not enabled because individual settings are defaulted to true, but because another umbrella mechanism is applied, and then individual settings can override it.

yeah, this sounds like a nice approach when configuration properties are limited to key/value pairs, but I think the Declarative Configuration SIG has ruled out any kind of built-in merge/overlay support for declarative (yaml) configuration

@yurishkuro
Copy link
Member

the Declarative Configuration SIG has ruled out any kind of built-in merge/overlay support for declarative (yaml) configuration

the presets approach does not contradict declarative configuration (e.g. golangci yaml config uses it)

@pellared
Copy link
Member

pellared commented Jan 10, 2025

As of today .NET Automatic Instrumentation has a lot of _ENABLED env vars defauling to true: https://opentelemetry.io/docs/zero-code/net/configuration. We failed to find a better name that would not be confusing. Our users and ourselves did not like the _DISABLED with default false proposal (it was discussed several times during SIG meetings).

Related issue: open-telemetry/opentelemetry-dotnet-instrumentation#1633

CC @open-telemetry/dotnet-instrumentation-maintainers (please double-check)

@reyang
Copy link
Member

reyang commented Jan 10, 2025

I support where we're going in this PR, however, I have a different view.

Instead of treating it as "Add exception for 'false default' when there is a double negation", I don't think it should be an exception. I think it should be the No. 1 principle - we want the configuration to be simple, the configuration should be easy to understand by the users, we should do our best to avoid the potential confusion.

If we agree with the "keeping things simple" principle, this is the order of priorities I would imagine:

  1. We SHOULD avoid double negation or similar things that would confuse the user.
  2. anything else ...

Comment on lines +75 to +79
named and defined such that false is the expected safe default behavior, unless
the name and false value would produce a double-negation (e.g. 'disabled'). In
the latter case Boolean environment variables MAY be named and defined such that
true is the safe default behavior. Renaming or changing the default value MUST
NOT happen without a major version upgrade.
Copy link
Member

@reyang reyang Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... MUST be named to minimize user confusion, such as double-negation (e.g. "xyz_disabled = false"). Renaming or changing the default value MUST NOT happen without a major version upgrade.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 10, 2025

if we're talking specifically about the situation where multiple instrumentations can be enabled by user while at the same time some are automatically enabled, a good approach that is often used in linters is to have a named "preset" (rather than implicit defaults) that controls which parts are enabled by default via that preset. In other words, the enabled defaults (which contradict the existing guidance for default being false) are not enabled because individual settings are defaulted to true, but because another umbrella mechanism is applied, and then individual settings can override it.

I like this suggestion. It is new to the discussion and deserves further investigation.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 10, 2025

I support where we're going in this PR, however, I have a different view.

Instead of treating it as "Add exception for 'false default' when there is a double negation", I don't think it should be an exception. I think it should be the No. 1 principle - we want the configuration to be simple, the configuration should be easy to understand by the users, we should do our best to avoid the potential confusion.

If we agree with the "keeping things simple" principle, this is the order of priorities I would imagine:

  1. We SHOULD avoid double negation or similar things that would confuse the user.
  2. anything else ...

This is just restating the contented issue. There has already been agreement in the specification here, namely #2755. If we would like to reverse that decision, let us do it based on facts and finds, not opinions and best guesses.

@MrAlias
Copy link
Contributor

MrAlias commented Jan 10, 2025

As of today .NET Automatic Instrumentation has a lot of _ENABLED env vars defauling to true: https://opentelemetry.io/docs/zero-code/net/configuration. We failed to find a better name that would not be confusing. Our users and ourselves did not like the _DISABLED with default false proposal (it was discussed several times during SIG meetings).

Related issue: open-telemetry/opentelemetry-dotnet-instrumentation#1633

CC @open-telemetry/dotnet-instrumentation-maintainers (please double-check)

I might be missing something, but I don't see any user feed back in open-telemetry/opentelemetry-dotnet-instrumentation#1633. Is this correct?

@lmolkova
Copy link
Contributor

offering my 2p here:

  • we are not adding new environment variables to the spec anymore
  • we're not going to change names of existing env vars because of this change if it's merged
  • the reality is that this particular part of the spec is not followed in practice (Java, .NET and collector are at least a few examples) and has usability concerns

What I think we could do:

  • do not follow this guidance in declarative config - use *.enabled pattern regardless of assumed default, document it there
  • add a clause here saying that this guidance is limited to cross-language env vars defined in this doc. The new guidance is in development and add a link to in-development declarative config section that documents new guidance

@pellared
Copy link
Member

I might be missing something, but I don't see any user feed back in open-telemetry/opentelemetry-dotnet-instrumentation#1633. Is this correct?

You are correct. We got feedback from a few users during SIG meetings. It was nothing formal (no surveys). I think that your proposal with making a survey e.g. via OTel End-User SIG is a great idea.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 22, 2025
@mx-psi
Copy link
Member Author

mx-psi commented Jan 22, 2025

What I think we could do:

  • do not follow this guidance in declarative config - use *.enabled pattern regardless of assumed default, document it there
  • add a clause here saying that this guidance is limited to cross-language env vars defined in this doc. The new guidance is in development and add a link to in-development declarative config section that documents new guidance

Thanks for the comment Liudmilla!

I personally find this to be the least disruptive option for end users and think it is reasonable, but I am not sure if this would address the concerns from @MrAlias or if it would fly with @open-telemetry/configuration-approvers.

I have not put more work on this because I do not see a way forward, so I would really appreciate more opinions about this.

@pellared
Copy link
Member

I do not see a way forward

I think you can ask @open-telemetry/sig-end-user-maintainers to help making a user-survey.

@github-actions github-actions bot removed the Stale label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants