Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry getting prom instance in Tempo start up #591

Merged
merged 2 commits into from
May 18, 2021

Conversation

mapno
Copy link
Member

@mapno mapno commented May 14, 2021

PR Description

Fixes a race condition where the agent will fail to boot if the prometheus
instance that it's set to use to remote write metrics in the tracing
pipeline is not ready yet.

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

  • CHANGELOG updated
  • Documentation added
  • Tests updated

Fixes a race condition where the agent will fail to boot if the prometheus
instance that it's set to use to remote write metrics in the tracing
pipeline is not ready yet.
@mapno mapno requested a review from joe-elliott as a code owner May 14, 2021 09:20
pkg/tempo/instance.go Outdated Show resolved Hide resolved
Comment on lines 150 to 155
prom, err := promManager.GetInstance(cfg.SpanMetrics.PromInstance)
if err == nil {
ctx = context.WithValue(ctx, contextkeys.Prometheus, prom)
break
}
<-time.After(defaultRetryInterval)
Copy link
Member

@rfratto rfratto May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of looping when building the pipeline, what do you think about passing the whole manager to the span metrics processor and perform the look up of the instance per span? It should be a pretty fast function call.

(This would have the side effect of some early spans being dropped though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, made an implementation in 944efc7.

I think losing a few metrics during start up is reasonable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That commit looks good, but did you mean to push it to this PR? I'm not seeing it here, only via that link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed it to the main repository, my bad. It should be fixed now!

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though you might want to let Joe take a second look. Are the docs up to date?

@mapno
Copy link
Member Author

mapno commented May 18, 2021

LGTM, though you might want to let Joe take a second look. Are the docs up to date?

They are from the previous PR. This one doesn't change anything for the user.

@mapno mapno merged commit 08716ce into grafana:main May 18, 2021
@mapno mapno deleted the retry-prom-instance branch May 18, 2021 08:05
@mattdurham mattdurham mentioned this pull request Sep 7, 2021
3 tasks
mattdurham pushed a commit that referenced this pull request Nov 11, 2021
* Retry getting prom instance in Tempo start up

Fixes a race condition where the agent will fail to boot if the prometheus
instance that it's set to use to remote write metrics in the tracing
pipeline is not ready yet.

* Pass manager to exporter instead of retrying
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Apr 13, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants