RunnerScaleSet listener fails on all but one cluster #57

jeffmccune · 2024-03-14T18:24:11Z

Tracks actions/actions-runner-controller#3351

The effect of this patch is limited to refreshing credentials only for namespaces that exist in the local cluster. There is structure in place in the CUE code to allow for namespaces bound to specific clusters, but this is used only by the optional Vault component. This patch was an attempt to work around actions/actions-runner-controller#3351 by deploying the runner scale sets into unique namespaces. This effort was a waste of time, only one listener pod successfully registered for a given scale set name / group combination. Because we have only one group named Default we can only have one listener pod globally for a given scale set name. Because we want our workflows to execute regardless of the availability of a single cluster, we're going to let this fail for now. The pod retries every 3 seconds. When a cluster is destroyed, another cluster will quickly register. A follow up patch will look to expand this retry behavior.

This patch fixes the problem of the actions runner scale set listener pod failing every 3 seconds. See actions/actions-runner-controller#3351 The solution is not ideal, if the primary cluster is down workflows will not execute. The primary cluster shouldn't go down though so this is the trade off. Lower log spam and resource usage by eliminating the failing pods on other clusters for lower availability if the primary cluster is not available. We could let the pods loop and if the primary is unavailable another would quickly pick up the role, but it doesn't seem worth it.

jeffmccune mentioned this issue Mar 15, 2024

(#57) Run gha-rs scale set only on the primary cluster #62

Merged

jeffmccune closed this as completed in #62 Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RunnerScaleSet listener fails on all but one cluster #57

RunnerScaleSet listener fails on all but one cluster #57

jeffmccune commented Mar 14, 2024

RunnerScaleSet listener fails on all but one cluster #57

RunnerScaleSet listener fails on all but one cluster #57

Comments

jeffmccune commented Mar 14, 2024