-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenTelemetry interceptors to capture traces from gRPC communications #3502
base: main
Are you sure you want to change the base?
Conversation
917b57a
to
2b9efb7
Compare
as a contributor, I just want to ask are we going to put packages in vendor folder? or just go.mod? |
…ions Signed-off-by: Antoine Toulme <antoine@lunar-ocean.com>
Is anyone available to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM.
but I want to know
- is unit test enough or we need test together with any sdk(node/java) which already enabled opentelemetry?
- ref to
grpclogging
andgrpcmetric
do we have any further plan to add more feature or logic forgrpctracing
in future?
|
@SamYuan1990 please review and approve. |
sorry @atoulme , I don't have the permission. As I said, so far, this PR LGTM. |
Or @atoulme , you got my approve for this pr. even if I can't make it as limited by permission, but I suppose I can do something by leaving this message to support add opentelemetry into fabric. |
@SamYuan1990 I appreciate you getting back to me and helping vet the PR. It's deeply appreciated. I believe I have sent an email to the fabric mailing list asking for a committer to take a look, and didn't hear back yet. |
}, | ||
UnaryInterceptors: []grpc.UnaryServerInterceptor{ | ||
grpcmetrics.UnaryServerInterceptor(grpcmetrics.NewUnaryMetrics(metricsProvider)), | ||
grpclogging.UnaryServerInterceptor( | ||
flogging.MustGetLogger("comm.grpc.server").Zap(), | ||
grpclogging.WithLeveler(grpclogging.LevelerFunc(grpcLeveler)), | ||
), | ||
grpctracing.UnaryServerInterceptor(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the point in having this? We only have streams in orderer, we don't have RPC calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows reporting external communications by the software, tagging each outgoing request with a trace ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. what outgoing request? this is a server side interceptor is it not?
@@ -661,13 +663,15 @@ func initializeServerConfig(conf *localconfig.TopLevel, metricsProvider metrics. | |||
StreamInterceptors: []grpc.StreamServerInterceptor{ | |||
grpcmetrics.StreamServerInterceptor(grpcmetrics.NewStreamMetrics(metricsProvider)), | |||
grpclogging.StreamServerInterceptor(flogging.MustGetLogger("comm.grpc.server").Zap()), | |||
grpctracing.StreamServerInterceptor(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what information is reported as telemetry as a result of you adding this interceptor? I am under the impression that only stream opening and termination.
If so, then how is this helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to record interactions with other servers, and tag outgoing and incoming requests with a trace ID so we can recreate the exchange of data between peers by collecting data from the consortium members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you perhaps give an example? I don't understand what is the type of information that is collected. Did you by chance experiment and can share what data is collected and presented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example does the trace ID correspond to the stream or to a single message that passes through the stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example here - not specific to those grpc messages, but tracing calls to contracts https://www.youtube.com/watch?v=NvDE1cnK_3I
Please refer to the RFC as well https://github.com/hyperledger/fabric-rfcs/blob/main/text/0000-opentelemetry-tracing.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I watched your demo, but I don't understand what information is reported by these stream interceptors. The RFC also doesn't mention this.
Hi @atoulme Adding a tracking / tracing / instrumentation / observability foundation to Fabric is an incredible contribution. Thank you for advancing this PR and RFC, it is both timely and incredibly relevant. There have been several discussions and efforts underway to characterize, measure, and improve the overall network throughput and transaction processing rates for Fabric, all of which will require a systematic, high-altitude view of system observability and custom metric aggregation. In addition to gRPC level trace monitoring, there have been some recent discussions around the need / opportunity to inject trace-level or function-level call tracking of core Fabric routines to profile, isolate, and resolve system bottlenecks. To help advance this effort, would you consider presenting the material, pros/cons, benefits, and impacts in a context in a forum that is more suitable for an interactive discussion? This PR represents a "landscape shifting" moment for observability in Fabric networks, which in my opinion, warrants more than a single PR for the contribution. There are several active projects starting to look at Fabric (and overlay / Level 2) networks under a performance lens, to which this addition is directly relevant. In addition to the mechanics of landing this (or related PRs), the teams looking at throughput, i/o, and performance optimization need to be aware of the benefits of system-wide observability made available with an Open Telemetry integration. Two good opportunities for socialization of the PR include:
|
@jkneubuh thank you for the kind words. I did present at a contributor meeting over a year ago. The time of the meeting is far from ideal for me as it takes place at 6am. I am not keen on presenting there again. I do however have a webinar with the Hyperledger Foundation to discuss this effort with Fabric and Besu. |
) | ||
|
||
func UnaryServerInterceptor() grpc.UnaryServerInterceptor { | ||
return otelgrpc.UnaryServerInterceptor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought some more and I think we should add an option to opt-in in the configuration (core.yaml
and orderer.yaml
).
I am not sure it's a good idea to load the otelgrpc
by default, all the time.
It's true that you configure it with environment variables but let's minimize the potential surface area risk.
https://github.com/cncf/tag-observability/blob/main/whitepaper.md#executive-summary |
See #2954 - was closed for inactivity while #2997 was pending.
Type of change
Description
Adds OpenTelemetry tracing by adding interceptors on all gRPC communications.
Related issues
This is tied to https://github.com/hyperledger/fabric-rfcs/blob/main/text/0000-opentelemetry-tracing.md