Retry getting prom instance in Tempo start up #591

mapno · 2021-05-14T09:20:35Z

PR Description

Fixes a race condition where the agent will fail to boot if the prometheus
instance that it's set to use to remote write metrics in the tracing
pipeline is not ready yet.

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

CHANGELOG updated
Documentation added
Tests updated

Fixes a race condition where the agent will fail to boot if the prometheus instance that it's set to use to remote write metrics in the tracing pipeline is not ready yet.

pkg/tempo/instance.go

rfratto · 2021-05-14T13:49:38Z

pkg/tempo/instance.go

+				prom, err := promManager.GetInstance(cfg.SpanMetrics.PromInstance)
+				if err == nil {
+					ctx = context.WithValue(ctx, contextkeys.Prometheus, prom)
+					break
+				}
+				<-time.After(defaultRetryInterval)


Instead of looping when building the pipeline, what do you think about passing the whole manager to the span metrics processor and perform the look up of the instance per span? It should be a pretty fast function call.

(This would have the side effect of some early spans being dropped though)

Good idea, made an implementation in 944efc7.

I think losing a few metrics during start up is reasonable

That commit looks good, but did you mean to push it to this PR? I'm not seeing it here, only via that link.

I pushed it to the main repository, my bad. It should be fixed now!

rfratto

LGTM, though you might want to let Joe take a second look. Are the docs up to date?

mapno · 2021-05-18T08:05:17Z

LGTM, though you might want to let Joe take a second look. Are the docs up to date?

They are from the previous PR. This one doesn't change anything for the user.

* Retry getting prom instance in Tempo start up Fixes a race condition where the agent will fail to boot if the prometheus instance that it's set to use to remote write metrics in the tracing pipeline is not ready yet. * Pass manager to exporter instead of retrying

Retry getting prom instance in Tempo start up

c13f4b4

Fixes a race condition where the agent will fail to boot if the prometheus instance that it's set to use to remote write metrics in the tracing pipeline is not ready yet.

mapno requested a review from joe-elliott as a code owner May 14, 2021 09:20

joe-elliott reviewed May 14, 2021

View reviewed changes

pkg/tempo/instance.go Outdated Show resolved Hide resolved

rfratto reviewed May 14, 2021

View reviewed changes

Pass manager to exporter instead of retrying

944efc7

rfratto approved these changes May 17, 2021

View reviewed changes

mapno merged commit 08716ce into grafana:main May 18, 2021

mapno deleted the retry-prom-instance branch May 18, 2021 08:05

mattdurham mentioned this pull request Sep 7, 2021

crow doc rfratto/agent#8

Closed

3 tasks

github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Apr 13, 2024

github-actions bot locked as resolved and limited conversation to collaborators Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry getting prom instance in Tempo start up #591

Retry getting prom instance in Tempo start up #591

mapno commented May 14, 2021

rfratto May 14, 2021 •

edited

Loading

mapno May 14, 2021

rfratto May 17, 2021

mapno May 17, 2021

rfratto left a comment

mapno commented May 18, 2021

Retry getting prom instance in Tempo start up #591

Retry getting prom instance in Tempo start up #591

Conversation

mapno commented May 14, 2021

PR Description

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

rfratto May 14, 2021 • edited Loading

Choose a reason for hiding this comment

mapno May 14, 2021

Choose a reason for hiding this comment

rfratto May 17, 2021

Choose a reason for hiding this comment

mapno May 17, 2021

Choose a reason for hiding this comment

rfratto left a comment

Choose a reason for hiding this comment

mapno commented May 18, 2021

rfratto May 14, 2021 •

edited

Loading