Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify backward compatibility requirements when introducing new capabilities #3954

Open
tigrannajaryan opened this issue Mar 22, 2024 · 12 comments
Assignees
Labels
spec:miscellaneous For issues that don't match any other spec label triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Mar 22, 2024

The configuration file proposal introduces new capabilities. Opinions vary on the topic of whether this is a breaking change or no.

We need to clarify what constitutes and what does not constitute a breaking change when new capabilities are introduced.

I am assigning this to myself to look into it.

Scenario of Interest

Version N: Capability A exists and behaves in a certain way.

Version N+1: Capability B is introduced (that did not exist in version N). B is optional and is opt-in, i.e. the user must perform a certain action to start using B after upgrading from version N to N+1.

When capability B is not used (user did not perform an action) the capability A in version N+1 behaves the same as in version N, there are no changes in the behavior of A. When capability B is used it changes the behavior of capability A, including but not limited to rendering capability A completely ineffective.

A sub-scenario with an additional nuance here is that capability A may be typically used by a different party (e.g. the cloud provider) than capability B (e.g. the end user).

Does this change from version N to N+1 constitute a breaking change?

@tigrannajaryan tigrannajaryan added the spec:miscellaneous For issues that don't match any other spec label label Mar 22, 2024
@tigrannajaryan tigrannajaryan self-assigned this Mar 22, 2024
@MrAlias
Copy link
Contributor

MrAlias commented Mar 22, 2024

What definition other than "upgrading a dependency for existing working code does not require changes to the code for it to continue working" is being evaluated?

@lmolkova
Copy link
Contributor

lmolkova commented Mar 22, 2024

What definition other than "upgrading a dependency for existing working code does not require changes to the code for it to continue working" is being evaluated?

I think we need to clarify what happens if there are more than two parties: e.g. OTel, user, and cloud provider integration/instrumentation.

  • when user opts into something and cloud provider integration breaks (cloud provider code needs to change to support this user) - is it a breaking change?
  • when cloud provider integration opts into something that may break user code, does it constitute a breaking change?

Also, it's not only about dependency versions updates, but feature flags and so on.

@MrAlias
Copy link
Contributor

MrAlias commented Mar 22, 2024

What definition other than "upgrading a dependency for existing working code does not require changes to the code for it to continue working" is being evaluated?

I think we need to clarify what happens if there are more than two parties: e.g. OTel, user, and cloud provider integration/instrumentation.

  • when user opts into something and cloud provider integration breaks (cloud provider code needs to change to support this user) - is it a breaking change?
  • when cloud provider integration opts into something that may break user code, does it constitute a breaking change?

Also, it's not only about dependency versions updates, but feature flags and so on.

It sounds like feature compatibility is being confused with backwards compatibility here.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 22, 2024

Ideally features are not mutually exclusive and new ones don't break existing ones. If they do, it means that old one is going to be deprecated. It'd be great to clarify it in scope of this work.

@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Mar 25, 2024

I believe one of the scenarios that people have different opinions about is the following:

Version N: Capability A exists and behaves in a certain way.

Version N+1: Capability B is introduced (that did not exist in version N). B is optional and is opt-in, i.e. the user must perform a certain action to start using B after upgrading from version N to N+1.

When capability B is not used (user did not perform an action) the capability A in version N+1 behaves the same as in version N, there are no changes in the behavior of A. When capability B is used it changes the behavior of capability A, including but not limited to rendering capability A completely ineffective.

[UPDATE] A sub-scenario with an additional nuance here is that capability A may be typically used by a different party (e.g. the cloud provider) than capability B (e.g. the end user).

Does this change from version N to N+1 constitute a breaking change?

@lmolkova
Copy link
Contributor

Thank @tigrannajaryan for the summary! It look great.

One more important detail I'd like to be made explicit:

User who opts into capability B may not know that capability A is used by something in their environment. The party that uses capability A does not opt into anything.

@tigrannajaryan
Copy link
Member Author

One more important detail I'd like to be made explicit:

User who opts into capability B may not know that capability A is used by something in their environment. The party that uses capability A does not opt into anything.

I updated my description to mention this nuance.

@trask
Copy link
Member

trask commented Mar 25, 2024

Another nuance I think is important is when Capability B is the "new better version of" Capability A.

E.g. I've heard @austinlparker (not unreasonably) referring to the new configuration proposal as "Configuration v2".

It doesn't mean we can't introduce "Configuration v2", but I think we should have a transition / deprecation plan around "Configuration v1" in order to avoid user confusion (e.g. similar to the plan to deprecate SpanEvents in favor of Log-based Events).

@lmolkova
Copy link
Contributor

lmolkova commented Mar 26, 2024

Agreed, I just hope configuration v2 would allow cloud providers to set defaults ;) (i.e. capability v1 can be deprecated in favor of v2 if v2 is a superset of v1)

@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Mar 26, 2024

Here is my personal opinion on this topic.

I am going to consider backward compatibility and couple other relevant aspects. I will use the configuration proposal as an example in the text below, but I think my reasoning also applies generally to the scenario of capabilities A vs B defined above.

Backward Compatibility

My litmus test for backward compatibility is the following:

If I upgrade from version N to version N+1 while keeping everything else unchanged does the documented functional system behavior change? If the answer is yes then it is a breaking, non-backward compatible change, otherwise it is a non-breaking, backward compatible change.

[UPDATE] I added a qualifier "documented" in the previous sentence to clarify that only documented behavior is important from compatibility perspective (and that obviously includes API definitions because they are documented). Undocumented, unspecified behavior is not part of compatibility considerations.

(Note emphasis on functional behavior as opposed to e.g. performance behavior which I don't consider to be part of the regular compatibility guarantees.)

Evaluated using this litmus test, I believe the scenario we are considering is not a breaking change. The fact that capability B is introduced in version N+1 and that it is opt-in and unless opted-in the behavior of capability A does not change is what makes me reach this conclusion.

Using our example, the new configuration proposal that adds an opt-in OTEL_CONFIG_FILE is not a breaking change. It is backward compatible. Unless the user opts in to the new OTEL_CONFIG_FILE the behavior of version N+1 is the same as in version N.

However, I think there is another important aspect that we need to consider when talking about capability changes, an aspect which I think is important in particular for the new configuration proposal.

Degradation of Capabilities

The configuration proposal suggests replacing a particular capability A (the configuration of the SDK by env vars) by a new capability B (configuration from a file). As far as I can tell the new capability is set to become the recommended way of performing the configuration of the SDK going forward and we will likely declare the old way of configuration deprecated sometime after the new configuration file is declared stable. In other words it is eventually going to become a replacement capability.

When a change like this - a replacement capability - is proposed, in addition to backward compatibility I think it is important to consider the following: does the replacement capability allow all use-cases that were previously allowed using the old capability?

I believe that unless we explicitly decide that certain use-cases are not necessary we should make a significant effort to continue supporting all use-cases that were supported previously. This is not a backward compatibility requirement. It is a requirement to avoid degradation of capabilities.

Evaluated from this perspective the new config proposal appears to be a degradation. It makes a previously possible use-case impossible or more cumbersome: the ability for the cloud provider and for the end user to supply portions of the configuration without explicit coordination.

(I do not yet have sufficient information to fully understand whether the cloud provider's use case is impossible or merely more cumbersome. I guess in some cases it may be impossible, depending on which process has what sort of permissions to write to the filesystem. If it is merely more cumbersome then I would call it a user experience degradation, which is still important but less critical than a complete impossibility, what I call degradation of capabilities above).

So, I think this is what we have in this particular case of the configuration proposal: it is not a breaking change, but it is a capability degradation.

In my opinion we should find a way to avoid the degradation. I have been thinking about how to do it and I think there is a possibility to do it by extending the new configuration proposal without throwing away any of the valuable work that the Configuration SIG did. I will post my thoughts about this on the relevant issue later.

@yurishkuro
Copy link
Member

As far as I can tell the new capability is set to become the recommended way of performing the configuration of the SDK going forward

If seems to me there is no consensus on this, and that's part of the problem. Does file configuration support end user vs. cloud provider separation of config ownership (#3954 (comment))? If it does, but in a different way, I would not consider it as a degradation.

@trask trask added triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted triage:deciding:tc-inbox Needs attention from the TC in order to move forward triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned and removed triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted triage:deciding:tc-inbox Needs attention from the TC in order to move forward labels Mar 26, 2024
@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Mar 27, 2024

UPDATE: I added a qualifier "documented" in my definition of the breaking change to clarify that only documented behavior is important from compatibility perspective (and that obviously includes API definitions because they are documented). Undocumented, unspecified behavior is not part of compatibility considerations.

@austinlparker austinlparker moved this to Spec - Accepted in 🔭 Main Backlog Jul 16, 2024
@austinlparker austinlparker moved this from Spec - Accepted to Spec - In Progress in 🔭 Main Backlog Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:miscellaneous For issues that don't match any other spec label triage:accepted:ready-with-sponsor Ready to be implemented and has a specification sponsor assigned
Projects
Status: Spec - In Progress
Development

No branches or pull requests

6 participants