Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TC Review Request: Go Metric SDK #1663

Closed
MrAlias opened this issue Aug 31, 2023 · 5 comments
Closed

TC Review Request: Go Metric SDK #1663

MrAlias opened this issue Aug 31, 2023 · 5 comments
Assignees

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Aug 31, 2023

The Go SIG is looking to make a stable release of the OpenTelemetry metric SDK in Go. We are looking for a TC member familiar with the metric SDK specification to review our offering and help ensure our compliance with the specification.

Prior work

We have conducted our own internal review of the specification here. That review has identified no outstanding compliance issues, but does include minor documentation work still to be done and the following specification issues that remain:

Other than this documented work in the linked project, the SDK should be ready for stable release and is currently ready for TC review.

Inventory

We would like to evaluate the v1.17.0/v0.40.0/v0.5.0 Release, links should be related to this tag.

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 31, 2023

cc @open-telemetry/technical-committee

@tigrannajaryan
Copy link
Member

@MrAlias I added this to the next TC meeting agenda. We will post back here once the TC member is selected.

@jmacd
Copy link
Contributor

jmacd commented Sep 6, 2023

I volunteered to do this. I am familiar with the Go metrics SDK and have followed the recent releases. I don't expect any blockers, but will do the review as requested.

@jmacd jmacd self-assigned this Sep 11, 2023
@jmacd
Copy link
Contributor

jmacd commented Sep 11, 2023

Speaking for the TC, I have reviewed the v1.17.0/v0.40.0 OTel-Go metrics SDK.

Thank you @MrAlias, for conducting a thorough internal review. Having witnessed the output of that review, I am not surprised to find the OTel-Go metrics SDK in a healthy state of conformity. Here are some areas of feedback for consideration as the user-base grows.

Backwards/Forwards compatibility protection

I really admire the work that has been put in to protecting the interfaces from breaking changes. Users are explicitly prevented from providing implementations of interfaces that they are not supposed to, to protect the future ability of maintainers to improve the code. This sort of protection goes beyond what OTel requires, but ultimately this rigor will help OTel adoption and prevent the need for future breaking changes.

Straight-forward code is easy to follow

I have constructed at least two Go metrics SDK prototypes by now, so I am familiar with the territory. What I admire most about this code is that it remains simple and straightforward, despite there being plenty of opportunity to do complicated things to appease demanding users. It is a good thing for a community-maintained SDK to stay simple and avoid premature optimization, and if there are specific problems with performance, users will bring those to the maintainers and (because of the protections mentioned above) there will be a path forward for future optimizations.

As an example of this statement, I noticed that for each temporality, each aggregator has a method named cumulative() and delta(). These methods are nearly identical, and it might be tempting to factor the code in such a way that temporality controls were less transparent -- for example, the logic to aggregate could be the same if there were another piece of code responsible for resetting the aggregator state when temporality==delta. Of course, this is a very high-level claim, and the details in the code really matter. It could be that there are more compact or more performance ways to implement this SDK, but I'm glad to have a simple starting point.

Memory limits

As discussed in open-telemetry/opentelemetry-go#3006, the SDK contains some TODOs about what was eventually specified in open-telemetry/opentelemetry-specification#2960. I believe it is OK to call the SDK stable without addressing this issue first (as long as the maintainers agree). On the other hand, I'm familiar with cardinality explosions impacting the OTel Collector already, so this would be a nice safety feature to prioritize.

Performance concerns

I expect there will be users who are not entirely satisfied by the performance of the SDK, but I don't see it as a problem with the SDK itself. A number of memory-allocation optimizations have been applied already (e.g., support for re-use across collections, use of sync.Pool in various places). My one reservation is about the cost of constructing attribute.Set objects which actually happens in the API, not in the SDK. One of the downsides of the compatibility work mentioned above is the appearance of functional options in the instrument methods (e.g., AddOption, RecordOption, ObserveOption). While this pattern is ultimately very flexible, these are performance-sensitive calls and I would prefer to see a more direct path into the SDK.

I am hopeful that OTel will continue to evolve in ways that encourage more efficient APIs to exist, which is not to say what's here has to change. With what we have in the Go API today, the user may construct a []AddOption to precompute the input to an Add operation--however there's no way to mutate and reuse this object for a different attribute set. For users upgrading from a statsd-like API surface (where attributes are provided as a list to every operation), this forces a fairly expensive sequence of allocations into the call path. If the API didn't force this, the SDK could be optimized to avoid allocating an attribute.Set when it already has the same set indexed. This is more-or-less borrowing an idea from the Prometheus client library, which is to say that computing a hash function over attributes can allow searching for an existing attribute set without allocations (i.e., faster, less overhead). All of the options that I see for improving this situation, which includes an old idea inside OTel known as "Bound instruments", involve changing the API. When the users speak up, I expect there will be pressure to extend the API with new code paths--none of this will go against the specification, however.

One last area of concern (also on performance), is the use of sync.Mutex in places where writes are infrequent and updates are frequent. I've noticed users (of this SDK) with serious lock contention problems in such situations, and have replaced most sync.Mutex objects with other approaches.

Summary

This SDK passes review. Well done @MrAlias, @MadVikingGod, @pellared and the @open-telemetry/go-approvers! 🎉

@jmacd
Copy link
Contributor

jmacd commented Sep 11, 2023

@MrAlias I feel we can close this, let me know if you're satisfied. Thanks!

@MrAlias MrAlias closed this as completed Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants