Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix healthchecks that do not persist when priority is changed #35748

Merged
merged 1 commit into from
Sep 7, 2024

Use previously calculated healthcheck when endpoints move priority le…

51d6ff0
Select commit
Loading
Failed to load commit list.
Merged

Fix healthchecks that do not persist when priority is changed #35748

Use previously calculated healthcheck when endpoints move priority le…
51d6ff0
Select commit
Loading
Failed to load commit list.
CI (Envoy) / Mobile/iOS skipped Sep 4, 2024 in 0s

Check was skipped

This check was not triggered in this CI run

Details

Request (pr/35748/main@51d6ff0)

shulin-sq @shulin-sq 51d6ff0 #35748 merge main@c198038

Fix healthchecks that do not persist when priority is changed

Commit Message: Use previously calculated healthcheck when endpoints move priority levels repeatedly. This is done by changing how mutable_cross_priority_host_map_ tracks hosts. mutable_cross_priority_host_map_ is used to track if a host already exists in the previous configuration, and its healthcheck should not be recalculated. This worked only some of the time because it would remove and then add all hosts that have changed in a priority, to a map of string (ip addr:port) to Host instance. However this did not account for when an endpoint had two Host representatives in different priorities, as is in the case when an endpoint changes priorities and there is an edge case where a host can be removed from mutable_cross_priority_host_map_ before the "should we skip activate healthchecks" logic triggers. This PR fixes that by only removing an endpoint from mutable_cross_priority_host_map_ if that removal is executed from the lowest priority. This fix does assume that memberships in priorities are always calculated starting from the lowest number to the highest.

Additional Description:
Risk Level: Med?
Testing: see note at the bottom
Docs Changes: n/a
Release Notes: added a note to the changelog
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue] #35243
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Explanation

This was a difficult bug to spot because it only happens when priority changes multiple times

For example let's consider this situation:

  • we have an endpoint A that swaps between priority 0, 1, and 2
  • we assume that priority is always processed starting from 0 and going up (eg, 0, 1, 2, 3, 4... etc)
  • mutable_cross_priority_host_map_ is the "list" in the situation that includes all endpoints from all priorities

When priority number goes up things are ok

0 -> 1
processing priority 0: remove A
processing priority 1: add A

1 -> 2
priority 1: remove A
priority 2: add A

but things get weird when numbers go down

2 -> 1, things are still peaceful here
priority 1: add A (but this gets ignored since A is already in the list)
priority 2: remove A (!!!)

1 -> 0, at this point the list does not include A, so any logic that checks if A exists in the cross priorty host map will fail and A will be considered as a new endpoint.

without the fix: https://gist.github.com/shulin-sq/adfb4268f5f199f054e908e3fd7afae8
with the fix:
https://gist.github.com/shulin-sq/7779a341e598d81cfaeca447b0f582d1

Testing

  • functional test described in the issue: #35243
  • unit test, confirmed that it fails before the fix is applied
Environment

Request variables

Key Value
ref fb342f0
sha 51d6ff0
pr 35748
base-sha c198038
actor shulin-sq @shulin-sq
message Fix healthchecks that do not persist when priority is changed...
started 1725469727.805146
target-branch main
trusted false
Build image

Container image/s (as used in this CI run)

Key Value
default envoyproxy/envoy-build-ubuntu:f94a38f62220a2b017878b790b6ea98a0f6c5f9c
mobile envoyproxy/envoy-build-ubuntu:mobile-f94a38f62220a2b017878b790b6ea98a0f6c5f9c
Version

Envoy version (as used in this CI run)

Key Value
major 1
minor 32
patch 0
dev true