Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify Span status according to specification #224

Merged

Conversation

tigrannajaryan
Copy link
Member

@tigrannajaryan tigrannajaryan commented Sep 25, 2020

  1. Renamed Status.code field to Status.deprecated_code.
  2. Renamed StatusCode enum to Status.DeprecatedStatusCode.
  3. Introduce new Status.code with corresponding StatusCode enum.
  4. Added guidelines about how to use the fields to ensure backward compatibility.

opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
jkwatson
jkwatson previously approved these changes Sep 25, 2020
Copy link
Contributor

@jkwatson jkwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link

@bryannaegele bryannaegele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this avoid being a breaking change of a stable API when the name of the field and its values are being changed?

@Oberon00
Copy link
Member

Also, since JSON is allowed, renaming is not even wire-compatible.

@bogdandrutu
Copy link
Member

JSON is not stable

anuraaga
anuraaga previously approved these changes Sep 28, 2020
//
// This field is deprecated and is replaced by `code` field below. See backward
// compatibility notes below. According to our stability guarantees this field
// will be removed in 12 months, on Sep 26, 2021. All usage of old senders and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR but just noticed since it's defined very clearly here with a date - 12 months seems like a pretty short time for API stability post-GA.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a new stability called "GA", that has stronger requirements.

arminru
arminru previously approved these changes Sep 28, 2020
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Sep 28, 2020

How does this avoid being a breaking change of a stable API when the name of the field and its values are being changed?

Names of the fields are not exposed in OpenTelemetry API. Compatibility from protocol perspective means wire compatibility. Names of the fields do not exist on the wire level.

We could keep the name of the field and of the enum values if there is a big desire to keep compatibility on the build level. I am not sure it is worth it and it was not part of the compatibility promise in the past (Collector codebase hides these details on purpose so that usage of OTLP protobuf messages in the codebase is not affected).

I am open to re-considering this and it does not affect the logic.

@Oberon00
Copy link
Member

Names of the fields are not exposed in OpenTelemetry API. Compatibility from protocol perspective means wire compatibility. Names of the fields do not exist on the wire level.

What about OTLP/JSON?

@tigrannajaryan
Copy link
Member Author

Names of the fields are not exposed in OpenTelemetry API. Compatibility from protocol perspective means wire compatibility. Names of the fields do not exist on the wire level.

What about OTLP/JSON?

Yes, in OTLP/JSON the Protobuf field names are directly exposed on the wire. However, OTLP/JSON is currently in "Alpha", we are free to change it:
image

@bryannaegele
Copy link

bryannaegele commented Sep 28, 2020

@tigrannajaryan tooling does rely on these names, so it is a breaking change for every library that relies on the proto names to have to update. When these files (Common and Trace) were officially marked stable in August, I assumed that included stable for tooling, as well.

What I'm not clear on is why this enum is now being deprecated at all, nor why it's being marked as deprecated by prefixing the word to everything. The comment directly above this code states The Status type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. but now those codes are marked deprecated and not replaced by anything equivalent.

It's unclear to me in https://github.com/open-telemetry/oteps/blob/master/text/trace/0136-error_flagging.md#status-mapping-schema if these codes are actually intended to be removed at this time.

As part of the specification, OpenTelemetry provides a mapping of semantic conventions to status codes. This removes any ambiguity as to what OpenTelemetry ships with out of the box.

What I did get from the OTEP was the addition of a code field Status.Code which has been named Status.StatusCanonicalCode in here and the addition of a Status.Source` field which seems to not be in the PR.

I could be missing something as it's difficult to keep up. If so, I'd love to have more context on this PR. I greatly welcome the additions but I'm unclear on the motivation of the deprecation.

@tigrannajaryan
Copy link
Member Author

It's unclear to me in https://github.com/open-telemetry/oteps/blob/master/text/trace/0136-error_flagging.md#status-mapping-schema if these codes are actually intended to be removed at this time.

@bryannaegele the OTEP's language may not be clear but the specification is clear: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#status

There is only one code - the StatusCanonicalCode, there is no other code in the spec that we need to also support in the protocol.

There is certainly no intent to have 2 different codes in the API or in the protocol. The OTEP's motivation was that the old enum did not make sense. This has been discussed in multiple meetings and in the OTEP.

tooling does rely on these names, so it is a breaking change for every library that relies on the proto names to have to update. When these files (Common and Trace) were officially marked stable in August, I assumed that included stable for tooling, as well.

I am not strongly opposed to keeping the old field name and old enum name and value names. The reason I used "deprecated_" as a prefix was to discourage its usage.

If there is no strong oposition I am happy to modify this PR to restore the names of the deprecated field and use new names for the new field and enum.

@bryannaegele
Copy link

bryannaegele commented Sep 28, 2020

@tigrannajaryan That doc changed 7 days ago where the API spec clearly stated that these status codes were to be used and thus available to other uses, such as messaging https://github.com/open-telemetry/opentelemetry-specification/blame/3e66e6fcc3bb8070db7ccbdd30005ae9f35ac29c/specification/trace/semantic_conventions/http.md#L40-L63. So the ability to set the status with these semantic conventions were in the API spec and does introduce a breaking change.

These status codes being set are also still in the referenced document for HTTP clients https://github.com/open-telemetry/opentelemetry-specification/blame/master/specification/trace/semantic_conventions/http.md#L109-L116.

Pardon my confusion, but this does seem more than a change to the protos and in fact represents a breaking API change, as well. The Commons were marked stable, so I hope you can understand why I'm questioning these changes.

@bryannaegele
Copy link

If there is no strong oposition I am happy to modify this PR to restore the names of the deprecated field and use new names for the new field and enum.

If this change must be done for some reason, I'm simply asking what is the replacement for marking statuses on a span beyond error? Are users meant to devise their own schemes going forward?

@bryannaegele
Copy link

I just caught up and found https://github.com/open-telemetry/opentelemetry-specification/pull/966/files from 4 days ago, so this conversation is apparently moot and we'll just have to deal with the breaking changes.

@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Sep 29, 2020

I'm simply asking what is the replacement for marking statuses on a span beyond error?

The updated spec does not define anything for this. These has been discussed extensively in the spec SIG and error SIG. This PR is a consequence, it is not something new. This simply makes the corresponding change in the protocol.

we'll just have to deal with the breaking changes.

Again, I am happy to keep the old field names in the protocol, but language SDK's have to deal with the spec change anyway. OTLP exporters in SDK will also have to comply with Compatibility requirements for senders that I added in this PR before GA. I don't think we want to release an SDK that has outdated understanding of Span Status.

@bryannaegele
Copy link

We can't all be in every SIG meeting :)

We'll deal with it. As you noted, we're going to have to remove all this from the SDK anyhow, so it's better to just get the pain over with.

I'm just looking for guidance at this point on how to guide users on how to substitute the functionality. For users that are already leveraging this functionality, what is the upgrade path for them? Are we just saying "Hey, error status codes aren't a thing anymore so you'll have to figure something out" or is there guidance/patterns we can provide? Asking as a user and a maintainer 😆

@Oberon00
Copy link
Member

Maybe we should add a compatibility semantic convention to ease transition at least?

@tigrannajaryan
Copy link
Member Author

I'm just looking for guidance at this point on how to guide users on how to substitute the functionality. For users that are already leveraging this functionality, what is the upgrade path for them? Are we just saying "Hey, error status codes aren't a thing anymore so you'll have to figure something out" or is there guidance/patterns we can provide? Asking as a user and a maintainer 😆

@bryannaegele best to ask this in spec repo or gitter where it will get more eyeballs.

@tigrannajaryan
Copy link
Member Author

Maybe we should add a compatibility semantic convention to ease transition at least?

@Oberon00 Can you please clarify what do you mean by this?

(I was about to merge this PR, just noticed you comment, so will hold for now).

@tigrannajaryan
Copy link
Member Author

I have put a bit more thought in to this and I believe the transition in codebases will be easier to do if we keep the names of the deprecated field and enum unchanged. I am going to revert that part of the change. @bryannaegele FYI.

I am going to also wait for open-telemetry/opentelemetry-specification#1069 resolution and do it in one PR. There is no point in introducing the field with wrong names and risk someone start using it and then immediately break it.

@tigrannajaryan tigrannajaryan changed the title Modify Span status according to specification [WIP] [DON'T MERGE] Modify Span status according to specification Oct 7, 2020
@tigrannajaryan
Copy link
Member Author

tigrannajaryan commented Oct 9, 2020

I updated this PR and removed "canonical" adjective to be inline with open-telemetry/opentelemetry-specification#1081

I am using "code value" since "code" is unfortunately already used by a deprecated field which have to keep around for now.

@tigrannajaryan
Copy link
Member Author

I would like to know what others think about the field names. The new name looks awkward to me. I would rather prefer to rename the old fields and use StatusCode for the new field/enum.

I am going to open an unrelated PR which legitimizes field renaming as soon as it does not break wire compatibility (which is fine up until we say OTLP/JSON is stable).

@tigrannajaryan
Copy link
Member Author

FYI, regarding field name guarantees: #225

@tigrannajaryan
Copy link
Member Author

Given that #225 got several approvals and is about to be merged I am inclined more and more to rename the fields since we no longer are limited by the compatibility guarantees. I am going to that unless I hear objections.

Tigran Najaryan added 7 commits October 22, 2020 11:58
1. Renamed Status.code field to Status.deprecated_code.
2. Renamed Status.code.StatusCode enum to Status.DeprecatedStatusCode.
3. Introduce new Status.code with corresponding StatusCanonicalCode enum.
4. Added guidelines about how to use the fields to ensure backward compatibility.
We want to keep the old field and enum names.
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
opentelemetry/proto/trace/v1/trace.proto Outdated Show resolved Hide resolved
@tigrannajaryan
Copy link
Member Author

@open-telemetry/specs-approvers @open-telemetry/specs-trace-approvers please review. I made a few changes back and forth in this PR. The PR description now reflects the latest state.

@tigrannajaryan
Copy link
Member Author

@open-telemetry/specs-approvers @open-telemetry/specs-trace-approvers please review. Implementation in the Collector cannot proceed until this is merged. The Collector is currently non-compliant with the spec because of this.

@Oberon00
Copy link
Member

I think there are already enough approvals, aren't there?

@tigrannajaryan
Copy link
Member Author

I think there are already enough approvals, aren't there?

@Oberon00 I re-requested reviews. Approvers previously saw and approved a different version of this PR and I think differences are pretty significant, so it is important for them to confirm they are still OK.

@Oberon00
Copy link
Member

@tigrannajaryan Then you should dismiss the reviews in question to make this clearer.

@tigrannajaryan tigrannajaryan dismissed stale reviews from arminru, anuraaga, and jkwatson October 26, 2020 16:47

PR is significantly changed, please re-review.

@tigrannajaryan
Copy link
Member Author

@arminru @anuraaga @jkwatson PTAL, I need to move forward, this is blocking Collector work. If I don't see any objections I will merge this today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants