memberlist keeps resurrecting deleted store-gateways #4010

nicolai86 · 2021-03-25T16:19:09Z

Describe the bug
when performing a scale-up of store-gateway pods followed by a scale-down memberlist entries of deleted store-gateway pods sporadically re-appear after a few hours as unhealthy in the memberlist ring.
The system doesn't recover from the ghost entries and they appear and disappear at random.

In our case we scaled 12 to 80 and back to 12, but this happens with lower scale-up numbers as well.
we verified that each unhealthy entry as reported by the metrics references a no longer existing store-gateway pod.
This is indicated in the logs with messages like

msg=\"auto-forgetting instance from the ring because it is unhealthy for a long time\" instance=store-gateway-15

To Reproduce
Steps to reproduce the behavior:

Start Cortex, using memberlist for store-gateway ring (efd1de4)
Scale up store-gateway deployment
Scale down store-gateway deployment
keep k8s cluster running for a few hours

relevant section of cortex configuration:

memberlist:
  bind_port: 7946

  join_members:
    - distributor-memberlist.cortex.svc.cluster.local:7946
    - compactor-memberlist.cortex.svc.cluster.local:7946
  abort_if_cluster_join_fails: false

  rejoin_interval: 10m

  left_ingesters_timeout: 20m

store_gateway:
  sharding_enabled: true
  sharding_strategy: default
  sharding_ring:
    kvstore:
      store: memberlist
      prefix: store-gateway-v1/
    heartbeat_timeout: 10m 
    zone_awareness_enabled: true

Expected behavior
Inspecting the cortex store-gateway ring status history for the lifetime of the cluster it shouldn't contain unhealthy store-gateways of deleted pods.

Environment:

Infrastructure: Kubernetes
Deployment tool: helm, custom chart

Storage Engine

Blocks
Chunks

Additional Context
#3603 was a PR to fix it, but it seems it doesn't cover some edge cases.

The text was updated successfully, but these errors were encountered:

tomwilkie · 2021-04-07T14:03:00Z

Thoughts of what could be causing this:

perhaps an old gossip is sitting in a queue somewhere? we could reject "stale" messages, if they exist
this would be expected behaviour if a partitioned node rejoins after a few hours - is that happening here?
we don't think this is due to ring converging too slowly, but just in case worth making sure that the gossip interval is <<< the tombstone timeout.
is it possible that someone in the right never learnt about the tombstone? shouldn't be the case, as we periodically sync the ring state

pracucci added component/memberlist component/store-gateway type/bug labels Mar 25, 2021

stevesg mentioned this issue Jun 9, 2021

Fix default memberlist configuration value for RetransmitMult. #4269

Merged

3 tasks

pstibrany closed this as completed in #4269 Jun 11, 2021

alvinlin123 mentioned this issue Jul 2, 2021

Update memberlist retransmit factor document #4334

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memberlist keeps resurrecting deleted store-gateways #4010

memberlist keeps resurrecting deleted store-gateways #4010

nicolai86 commented Mar 25, 2021 •

edited

Loading

tomwilkie commented Apr 7, 2021

memberlist keeps resurrecting deleted store-gateways #4010

memberlist keeps resurrecting deleted store-gateways #4010

Comments

nicolai86 commented Mar 25, 2021 • edited Loading

tomwilkie commented Apr 7, 2021

nicolai86 commented Mar 25, 2021 •

edited

Loading