Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
OCPBUGS-36301: parallelize member health checks
https://issues.redhat.com/browse/OCPBUGS-36301 Currently, member health is checked in serial with a 30s timeout per member. 3 out of 4 GetMemberHealth callers had their own default 30s timeout as well for the entire process. Because of this, a slow check on one member could exhaust the timeout for the entire GetMemberHealth function, and thus cause later-checked members to report as unhealthy even though they were fine. With this commit, I am dropping the internal 30s timeout from GetMemberHealth, and instead letting the caller set the timeout. Also, the code now checks the health of all members in parallel. This will prevent a single slow member from affecting the health reporting of other members. I also added a timeout to the context used in IsMemberHealthy which calls GetMemberHealth. Neither Trevor nor I were sure why a default timeout wasn't present there, though one was present in all other callsites.
- Loading branch information