Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go 1.16 + OTLPProto/Collector upgrades #546

Merged
merged 13 commits into from
Mar 1, 2021

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Feb 22, 2021

What this PR does:
This PR updates several OTEL dependencies and addresses some of the pain points. This is in the same line as another attempt #502 but goes further.

Changes:

  1. Go 1.16
  2. Update opentelemetry-collector from v0.6.1 to v0.16.0.
  3. Update opentelemetry-proto submodule from v0.3.0 to v0.7.0.
  4. Update otel/build-protobuf docker image from 0.1.0 to 0.2.1.
  5. Change the way otel proto is handled: Previously Tempo edited the OpenTelemetryCollector vendored code to mesh with opentelemetry-proto. Now Tempo leaves all the dependencies alone and creates its own derivative types in /pkg/tempopb. Remarshal/unmarshal is used to swap types in shim.go. This is unfortunate for efficiency but a significant win by decoupling things. The reverse swap also occurs in tempo-query.

Additional Changes/Notes:

  1. The goal is that a downstream app (i.e. grafana) could vendor /pkg/tempopb/ to pull in everything needed to communicate with Tempo with no other dependencies, and also importantly: no conflicts with other usages of OTEL proto or collector.
  2. Go 1.16 update includes new go mod tidy -e parameter to continue despite errors. This eliminates another hurdle with dependency updates which was the cmux -> test -> grpc hello world issue as stated in the previous PR. This error still occurs but the use of -e argument allows build process to continue. Note that Tempo still builds and tests successfully because the grpc hello world reference are in cmux tests which Tempo CI does not execute. Note this change also allows cleanup of the vendor-fix/prometheus work-around.
  3. Since there is no longer a link between vendoring and protos, Makefile was simplified to remove vendor-dependencies. Now there are only vendor-check and gen-protos.

Questions / Next Steps:

  1. This PR could be somewhat broken down into smaller PRs if desired.
  2. Testing:
    Local testing was performed to cover most cases, if anything else needed please mention.
    2.1 Unit tests
    2.2. integration tests
    2.3. Jaeger collector (using synthetic-load-generator)
    2.4. Zipkin collector (using curl)
    2.5. OTLP gRPC and HTTP collectors (using my own test app)
    2.6 Grafana/tempo-query lookup
  3. The proto swap in shim.go converts otlp.ExportTraceServiceRequest to tempopb.Trace which are technically wire-compatible. But Tempo does have its own definition of ExportTraceServiceRequest which we could use but needs a few more lines in makefile, etc. It seemed simpler to do it this way.

Which issue(s) this PR fixes:
None but likely introduces a few

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usual smattering of questions and thoughts. Also:

  • does this PR contain native support for OTLP in the distributor? i think we want to go ahead and add that. no idea what the additional load will look like with the bytes marshalling/unmarshalling in the shim.
  • ctrl+shift+f for 1.15. it is referenced in a few places that need to be updated.
  • contributing.md in particular probably needs review b/c it's where instructions related to these changes will be contained.
  • if we do add native support for grpc/http for otlp to this PR we need some documentation that explains why the shims are convenience layers for low volume situations, but if you want a true tracing backend you need a trace pipeline and you should push otlp to tempo only (and not through the shim)

modules/distributor/distributor_test.go Show resolved Hide resolved
go.mod Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@mdisibio
Copy link
Contributor Author

mdisibio commented Mar 1, 2021

  • ctrl+shift+f for 1.15. it is referenced in a few places that need to be updated.
  • contributing.md in particular probably needs review b/c it's where instructions related to these changes will be contained.

Good catch, addressed.

  • does this PR contain native support for OTLP in the distributor? i think we want to go ahead and add that. no idea what the additional load will look like with the bytes marshalling/unmarshalling in the shim.
  • if we do add native support for grpc/http for otlp to this PR we need some documentation that explains why the shims are convenience layers for low volume situations, but if you want a true tracing backend you need a trace pipeline and you should push otlp to tempo only (and not through the shim)

Yes agree, we definitely want to migrate to native support to reclaim the efficiency (and for other reasons), but think it makes to do in a followup PR: this one is already decently large and risk of drift, and benchmarking locally and in internal environment isn't show any performance regression in cpu or mem usage.

@mdisibio mdisibio changed the title WIP - Go/OTLPProto/OTLPCollector upgrades Go 1.16 + OTLPProto/Collector upgrades Mar 1, 2021
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best PR since the first one.

i've been waiting months to fix all this. excellent work.

@mdisibio mdisibio merged commit 0448492 into grafana:master Mar 1, 2021
@mdisibio mdisibio deleted the otel-collector-upgrade branch May 27, 2021 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants