-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry getting prom instance in Tempo start up #591
Conversation
Fixes a race condition where the agent will fail to boot if the prometheus instance that it's set to use to remote write metrics in the tracing pipeline is not ready yet.
pkg/tempo/instance.go
Outdated
prom, err := promManager.GetInstance(cfg.SpanMetrics.PromInstance) | ||
if err == nil { | ||
ctx = context.WithValue(ctx, contextkeys.Prometheus, prom) | ||
break | ||
} | ||
<-time.After(defaultRetryInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of looping when building the pipeline, what do you think about passing the whole manager to the span metrics processor and perform the look up of the instance per span? It should be a pretty fast function call.
(This would have the side effect of some early spans being dropped though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, made an implementation in 944efc7.
I think losing a few metrics during start up is reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That commit looks good, but did you mean to push it to this PR? I'm not seeing it here, only via that link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed it to the main repository, my bad. It should be fixed now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though you might want to let Joe take a second look. Are the docs up to date?
They are from the previous PR. This one doesn't change anything for the user. |
* Retry getting prom instance in Tempo start up Fixes a race condition where the agent will fail to boot if the prometheus instance that it's set to use to remote write metrics in the tracing pipeline is not ready yet. * Pass manager to exporter instead of retrying
PR Description
Fixes a race condition where the agent will fail to boot if the prometheus
instance that it's set to use to remote write metrics in the tracing
pipeline is not ready yet.
Which issue(s) this PR fixes
Notes to the Reviewer
PR Checklist