Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BugFix: 1.0 Ingesters having availability zone value erased by pre-1.0 ingesters during rolling upgrade. #2404

Merged

Conversation

khaines
Copy link
Contributor

@khaines khaines commented Apr 4, 2020

What this PR does: Corrects a bug reported by @pracucci when upgrading ingesters from pre-1.0, which causes the new ingesters to have their availability zone string overwritten by a blank value. This PR addresses the issue by having the ingesters update the ring data with their zone on every heartbeat, similar to what they do with address information.

Which issue(s) this PR fixes:
Fixes #2401

Checklist

  • [ x ] Tests updated
  • [ X ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

khaines added 2 commits April 3, 2020 22:09
… the zone on the updated record

Signed-off-by: Ken Haines <khaines@microsoft.com>
…rnal actor clears it

Signed-off-by: Ken Haines <khaines@microsoft.com>
Signed-off-by: Ken Haines <khaines@microsoft.com>
@pstibrany
Copy link
Contributor

This is a step in the right direction. Unfortunately, it doesn't fix the fact that old ingesters (not supporting the zones) will clear the zone information for all ingesters on each heartbeat.

I think we should also make "no zone" to be empty string (and not ingester ID as it is today), to avoid seeing artifacts like this when watching the /ring page. Also visual check of the /ring would be faster -- it would be easy to see which ingesters have zone set. What do you think?

@khaines
Copy link
Contributor Author

khaines commented Apr 6, 2020

@pstibrany this is only an issue during a rolling upgrade from pre1.0, once the old ingesters are rolled off the zones will be no longer overwritten each heartbeat. I would like to hear more if you are thinking this needs support outside the upgrade process.

Since it was added to ignore zoning for a blank string, I agree we should make that the default "no zone"

… a blank string

Signed-off-by: Ken Haines <khaines@microsoft.com>
@khaines khaines requested review from pracucci, gouthamve and pstibrany and removed request for pracucci April 6, 2020 14:45
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ken, looks good!

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @khaines for working on it! LGTM to me too.

@pstibrany pstibrany merged commit d015aad into cortexproject:master Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cortex 1.0.0 ingesters rollout breaks zone set in the ring
3 participants