Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from Beeline to OTel #6361

Closed
aarongable opened this issue Sep 7, 2022 · 1 comment · Fixed by #6750
Closed

Switch from Beeline to OTel #6361

aarongable opened this issue Sep 7, 2022 · 1 comment · Fixed by #6750

Comments

@aarongable
Copy link
Contributor

We currently use honeycomb's beeline-go library for collecting traces. This library is in maintenance mode, as the observability/telemetry industry has converged on the Open Telemetry protocol for collating and exfiltrating traces. We should switch to using the opentelemetry-go library instead.

Alternatively, we should remove our tracing infrastructure entirely until we're ready to actually export our traces.

@mcpherrinm
Copy link
Contributor

mcpherrinm commented Sep 19, 2022

I've started working on this when I have a bit of time @ #6750

aarongable pushed a commit that referenced this issue Mar 14, 2023
Remove tracing using Beeline from Boulder. The only remnant left behind
is the deprecated configuration, to ensure deployability.

We had previously planned to swap in OpenTelemetry in a single PR, but
that adds significant churn in a single change, so we're doing this as
multiple steps that will each be significantly easier to reason about
and review.

Part of #6361
aarongable pushed a commit that referenced this issue Mar 15, 2023
Upgrade grpc to v1.53.0, as preparation for introducing OpenTelemetry,
which depends on that grpc version.

Two changes to our own code were necessitated by upstream changes:

1. Add a stub implementation of GetOrBuildProducer: this was added to
the balancer.SubConn interface by grpc v1.51.0

2. Change use of Endpoint field to Endpoint() method: the field was
removed and replaced by a method in
grpc/grpc-go#5852. This also means that our
tests can't set the .Endpoint field, so the tests are updated to use the
.URL field instead, and a helper has been added to make that easy.

Part of #6361
aarongable added a commit that referenced this issue Apr 21, 2023
Add a new shared config stanza which all boulder components can use to
configure their Open Telemetry tracing. This allows components to
specify where their traces should be sent, what their sampling ratio
should be, and whether or not they should respect their parent's
sampling decisions (so that web front-ends can ignore sampling info
coming from outside our infrastructure). It's likely we'll need to
evolve this configuration over time, but this is a good starting point.

Add basic Open Telemetry setup to our existing cmd.StatsAndLogging
helper, so that it gets initialized at the same time as our other
observability helpers. This sets certain default fields on all
traces/spans generated by the service. Currently these include the
service name, the service version, and information about the telemetry
SDK itself. In the future we'll likely augment this with information
about the host and process.

Finally, add instrumentation for the HTTP servers and grpc
clients/servers. This gives us a starting point of being able to monitor
Boulder, but is fairly minimal as this PR is already somewhat unwieldy:
It's really only enough to understand that everything is wired up
properly in the configuration. In subsequent work we'll enhance those
spans with more data, and add more spans for things not automatically
traced here.

Fixes #6361

---------

Co-authored-by: Aaron Gable <aaron@aarongable.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants