Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add backoff to flush op #13140

Merged
merged 1 commit into from
Jun 7, 2024
Merged

Conversation

grobinson-grafana
Copy link
Contributor

@grobinson-grafana grobinson-grafana commented Jun 5, 2024

What this PR does / why we need it:

This commit adds a configurable backoff to flush ops in the ingester. This is to prevent situations where the store put operation fails fast (i.e. 401 Unauthorized) and can cause ingesters to be rate limited.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@grobinson-grafana grobinson-grafana requested a review from a team as a code owner June 5, 2024 11:06
@grobinson-grafana grobinson-grafana changed the title Add backoff to flush op feat: Add backoff to flush op Jun 5, 2024
@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-flush-backoff branch 5 times, most recently from 551891a to 48bbf5c Compare June 5, 2024 11:11
@@ -126,7 +128,10 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {

f.IntVar(&cfg.ConcurrentFlushes, "ingester.concurrent-flushes", 32, "How many flushes can happen concurrently from each stream.")
f.DurationVar(&cfg.FlushCheckPeriod, "ingester.flush-check-period", 30*time.Second, "How often should the ingester see if there are any blocks to flush. The first flush check is delayed by a random time up to 0.8x the flush check period. Additionally, there is +/- 1% jitter added to the interval.")
f.DurationVar(&cfg.FlushOpTimeout, "ingester.flush-op-timeout", 10*time.Minute, "The timeout before a flush is cancelled.")
f.DurationVar(&cfg.FlushOpBackoff.MinBackoff, "ingester.flush-op-backoff-min-period", 10*time.Second, "Minimum backoff period when a flush fails. Each concurrent flush has its own backoff, see `ingester.concurrent-flushes`.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to declare these here instead of using cfg.FlushOpBackoff.RegisterFlagsWithPrefix() for two reasons:

  1. I wanted to use - separator instead of . to be consistent with ingester.flush-op-timeout.
  2. I wanted to write a more specific help message for each of these options that explains how they are used.

@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-flush-backoff branch 2 times, most recently from 5bb49f5 to 837d5c8 Compare June 5, 2024 11:18
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jun 5, 2024
@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-flush-backoff branch 2 times, most recently from 1ee50bd to 127099b Compare June 5, 2024 12:51
cyriltovena
cyriltovena previously approved these changes Jun 5, 2024
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-flush-backoff branch 5 times, most recently from e661dcf to d26e496 Compare June 6, 2024 14:21
@grobinson-grafana grobinson-grafana marked this pull request as draft June 6, 2024 19:04
@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-flush-backoff branch 2 times, most recently from 6ca1cb2 to 1994e1a Compare June 7, 2024 09:01
@grobinson-grafana grobinson-grafana marked this pull request as ready for review June 7, 2024 09:01
@grobinson-grafana grobinson-grafana dismissed cyriltovena’s stale review June 7, 2024 09:02

I made some changes to how the timeout works (per flush instead of per op), would love another review.

@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-flush-backoff branch 2 times, most recently from 1c23a0f to c1bf4d6 Compare June 7, 2024 09:06
This commit adds a configurable backoff to flush ops in the ingester.
This is to prevent situations where the store put operation fails fast
(i.e. 401 Unauthorized) and can cause ingesters to be rate limited.
@grobinson-grafana grobinson-grafana force-pushed the grobinson/add-flush-backoff branch from c1bf4d6 to ae77be1 Compare June 7, 2024 09:42
@@ -135,8 +137,9 @@ func (i *Ingester) sweepStream(instance *instance, stream *stream, immediate boo
}

func (i *Ingester) flushLoop(j int) {
l := log.With(i.logger, "loop", j)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the loop to the logger so when a flush fails, or is canceled after max retries, we can see which loop it was.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Well done 👏

@cyriltovena cyriltovena merged commit 9767807 into main Jun 7, 2024
60 checks passed
@cyriltovena cyriltovena deleted the grobinson/add-flush-backoff branch June 7, 2024 15:02
grobinson-grafana added a commit that referenced this pull request Jul 2, 2024
grobinson-grafana added a commit that referenced this pull request Jul 2, 2024
grobinson-grafana added a commit that referenced this pull request Jul 2, 2024
grobinson-grafana added a commit that referenced this pull request Jul 2, 2024
grobinson-grafana added a commit that referenced this pull request Jul 2, 2024
grobinson-grafana added a commit that referenced this pull request Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants