-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define telemetry stability guarantees #1301
Comments
What constitutes a breaking changeFor metrics here is an incomplete list of changes that may be breaking for readers of the telemetry (e.g. for dashboards in the backends):
For spans:
For logs:
[EDIT] We will also need to define relevant changes for Resources. For future discussions I suggest we call the shape and structure of all of the data that is referenced above the schema of telemetry. How we handle the changesUnlike API, I believe changes described above are more likely to occur during the lifelime of an instrumentation library. I do not think we should aim to lock the telemetry schema and disallow changes listed above. Such locking would place a huge limitation on how instrumentation can evolve and would make it nearly impossible to fix mistakes in the semantic conventions, in the schema or in the implementation of the instrumentation (which will inevitably happen sooner or later). [EDIT] I extracted full proposal here #1324 to avoid derailing this issue. In the context of this issue here is a shorter summary:
|
Thanks @tigrannajaryan. After more conversations I agree. We should issue a v1.0 without these data guarantees. We can't even begin to address this until everything else in tracing and metrics is complete. I've removed the |
Should we give this priority for the next GA release? It might have slipped through so far because it has the |
Assigning this to myself since I plan to work on a very related issue. |
Related reference from k8s about metric stability: https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/#metric-lifecycle IMO, k8s is not fully solving the problem, but rather makes it manageable, creates clarity in the deprecation process and slows it down appropriately to give affected parties time to react. While certainly an improvement over just making breaking changes I think we need to go in a different, more promising direction: #1324 |
@tigrannajaryan Is there an issue to add schemas to the specification now that open-telemetry/oteps#152 is merged? Otherwise, what is the next step for it? I see we already have this folder: https://github.com/open-telemetry/opentelemetry-specification/tree/main/schemas so does that mean the OTEP is adopted? |
@tigrannajaryan in that case, can this issue be closed? |
Yes, I think we can close this. We will create new issues for any follow up tasks/problems as needed. |
It is critical that OpenTelemetry produces telemetry which remains stable. Changes to telemetry produced by OpenTelemetry instrumentation should avoid breaking analysis tools, such as dashboards and alerts. However, it is not clear at this time what type of instrumentation changes (for example, adding additional spans and labels) would actually cause a breaking change.
Related questions:
Until this issue is resolved, instrumentation packages must not be marked as stable. API and SDK packages may still be marked as stable. The lack of telemetry stability should be clearly communicated in the documentation for every OpenTelemetry client, to avoid confusion.
The text was updated successfully, but these errors were encountered: