-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Exemplars Support #448
Prometheus Exemplars Support #448
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR handles only HTTP but we need to handle all possible cases here:
- gRPC
- Kafka
- AMQP
- etc
So this PR should include all of them
client/http/http.go
Outdated
|
||
traceID, ok := wct.ExtractTraceID(req.Context()) | ||
if ok { | ||
durationHistogram.(prometheus.ExemplarObserver).ObserveWithExemplar( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a way to type assert safely which is:
v, ok := x.(T) // ok will be either true or false
Hello @EvgeniaMartynova-thebeat and thanks for opening your first PR for Patron! 🎉 We really appreciate you taking the time to improve our shared tooling! I have no idea what Prometheus Exemplars are all about (admittedly, I've got some reading to do 😄) but I wanted to clarify a couple of points.
|
Hi @tpaschalis Exemplars are part of OpenMetrics Spec. They are an optional feature and as specified in the spec, the ingestors may ignore them so they are backwards compatible. |
…pc, sql, redis, etc; put success label as err == nil for all of the clients
Thank you for your quick review, @tpaschalis
It is now part of Open Metrics format: https://grafana.com/blog/2021/03/31/intro-to-exemplars-which-enable-grafana-tempos-distributed-tracing-at-massive-scale/ |
Added exemplars for the following clients: Since we use a counter vector in Kafka client we cannot use exemplars there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Components are also missing (Kafka, AWS SQS, etc.). We have only the clients covered.
client/grpc/grpc.go
Outdated
@@ -4,6 +4,7 @@ package grpc | |||
|
|||
import ( | |||
"context" | |||
"github.com/uber/jaeger-client-go" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the imports are not following the standard layout. check with go fmt or goimports which can be setup in any IDE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, @mantzas, installed goimports and ran on the changed files
client/grpc/grpc.go
Outdated
durationHistogram := rpcDurationMetrics.WithLabelValues(unary, target, method, rpcStatus.Code().String()) | ||
if sctx, ok := span.Context().(jaeger.SpanContext); ok { | ||
durationHistogram.(prometheus.ExemplarObserver).ObserveWithExemplar( | ||
invokeDuration.Seconds(), prometheus.Labels{"traceID": sctx.TraceID().String()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"traceID" seems to be a good candidate as a constant in the trace package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created a constant as suggested
run goimports added support of Prometheus exemplars to components http middleware and gRPC observability
…t' into feat/prometheus-exemplars-support
introduced Prometheus Exemplars also in For the other components it seems a server side latency is not measured, do you think it make sense to introduce such metric in this PR for the other components? |
We might not measure latency but we measure message count over time which can be augmented with exemplars in order to give us more metadata. Is this not the case or a good use case? WDYT? |
Good point! Just found out that a |
added support of Prometheus Exemplars for request/message CounterVec metric for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only have an empty line in the imports that separate the system from the other packages.
Looking at this and all the other example I am wondering if we should introduce some abstraction to get rid of this repetitive code. Example on the counter: If we introduce a new counter in the trace package that abstracts the prometheus counter we could implemented methods to increase, decrease, add that also take care of the exemplar business.
We have the following option to move into this direction if it makes sense. Make it part of this PR or create a new ticket in Patron to cleanup all of this in a separate PR.
client/amqp/v2/amqp.go
Outdated
"strconv" | ||
"time" | ||
|
||
"github.com/uber/jaeger-client-go" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line should be removed
client/es/elasticsearch.go
Outdated
@@ -13,12 +12,14 @@ import ( | |||
"strings" | |||
"time" | |||
|
|||
"github.com/uber/jaeger-client-go" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line
client/grpc/grpc.go
Outdated
"time" | ||
|
||
"github.com/uber/jaeger-client-go" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
client/http/http.go
Outdated
"io" | ||
"net/http" | ||
"strconv" | ||
"time" | ||
|
||
"github.com/uber/jaeger-client-go" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
client/kafka/v2/kafka.go
Outdated
"github.com/beatlabs/patron/trace" | ||
"github.com/opentracing/opentracing-go" | ||
"github.com/uber/jaeger-client-go" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty line
client/sns/v2/publisher.go
Outdated
"strconv" | ||
"time" | ||
|
||
"github.com/uber/jaeger-client-go" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
I also don't like this code duplication, I will do what you are suggesting in this PR. |
Used a decorator pattern to move all repetitive Exemplars related code to |
…t' into feat/prometheus-exemplars-support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @EvgeniaMartynova-thebeat
I just added a minor comment regarding documentation.
docs/observability/Observability.md
Outdated
[Prometheus Exemplars](https://grafana.com/docs/grafana/latest/basics/exemplars/) | ||
it became possible to move seamlessly between metrics, logs, and traces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be rephrased as any OpenTracing compatible tracing system (Jaeger and Tempo so far only AFAIK) can work with Prometheus Exemplars, not just Tempo.
The second part regarding seamless integration between metrics, logs and traces
is a Grafana specific feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rephrased according to the comment, removed the second part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎖️
trace/metric.go
Outdated
spanFromCtx := opentracing.SpanFromContext(ctx) | ||
if spanFromCtx != nil { | ||
if sctx, ok := spanFromCtx.Context().(jaeger.SpanContext); ok { | ||
c.Counter.(prometheus.ExemplarAdder).AddWithExemplar( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm missing the point about why do we need to make type assertions. If Counter
is a prometheus.ExemplarAdder
, then why can't we make this field a prometheus.ExemplarAdder
,?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Counter
is a prometheus.Counter
interface.
Theoretically there could be a bespoke counter implementing prometheus.Counter
interface but not prometheus.ExemplarAdder
interface, that is why we need a type assertion here.
Or am I missing your point here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we talk about evolutions and that the Counter
may be of another type later on, then this code will panic (example). How are we going to handle such a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot, @teivah, indeed you caught a potential panic, fixed, I applied the same change for Histogram
struct in the same file.
trace/metric_test.go
Outdated
if tt.expectedPanic { | ||
defer func() { | ||
if r := recover(); r == nil { | ||
t.Errorf("Add method did not panic.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Errorf("Add method did not panic.") | |
t.Error("Add method did not panic.") |
trace/metric_test.go
Outdated
if tt.expectedPanic { | ||
defer func() { | ||
if r := recover(); r == nil { | ||
t.Errorf("Add method did not panic.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Errorf("Add method did not panic.") | |
t.Error("Add method did not panic.") |
trace/metric_test.go
Outdated
} | ||
|
||
pb := &dto.Metric{} | ||
m.Write(pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error isn't handled. I suggest that we ignore it explicitly at least:
m.Write(pb) | |
_ = m.Write(pb) |
And also, CI is failing. |
a0fad1b
Which problem is this PR solving?
Enabling Prometheus Exemplars. This PR is a copy of the POC made by @ioannissavvaidis.
Short description of the changes
Added support for Prometheus Exemplars
Updated Observarbility.md documentation
Enabled Prometheus Exemplars for request latency and counter metric in
../client/http/http.go
../client/amqp/v2/amqp.go
../client/es/elasticsearch.go
../client/grpc/grpc.go
../client/redis/redis.go
../client/sns/v2/publisher.go
../client/sql/publisher.go
../client/sns/v2/publisher.go
../component/amqp
../component/grpc
../component/kafka
..component/http/middleware
../component/sqs
../client/kafka/v2
Fixed a way how we record "success" metric. Mark metric entry as "success" if
err == nil
before we marked it as success iferr != nil