You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of evaluating autoscaling, I found out that workers (the calls from kubernetes-client, specifically) don't handle transient faults, which occur when a cluster is being scaled.
While the transient faults caused by the AKS's autoscale has its own problem to be resolved from their end (Azure/AKS#381), I think a client like brigade workers should handle such problems regardless.
According to kubernetes-client, transient faults are left for user to handle at this moment. (kubernetes-client/javascript#96).
So I updated k8s.ts with the exponential backoff retry, which seems to work in a scaling scenario where timeout occurs time to time.
I'm planning to make a PR regarding this, and please let me know if you have any thoughts.
Cheers,
Jeehwan
The text was updated successfully, but these errors were encountered:
* fix(worker): handle transient faults (#625)
fix(worker): handle JSON.parse exception caused by transient faults in
watch
feat(worker): support KUBECONFIG (feedback from #596)
* Specify types for pod and request in worker (#626)
Make sure watch stream stay connected
As part of evaluating autoscaling, I found out that workers (the calls from kubernetes-client, specifically) don't handle transient faults, which occur when a cluster is being scaled.
While the transient faults caused by the AKS's autoscale has its own problem to be resolved from their end (Azure/AKS#381), I think a client like brigade workers should handle such problems regardless.
According to kubernetes-client, transient faults are left for user to handle at this moment. (kubernetes-client/javascript#96).
So I updated k8s.ts with the exponential backoff retry, which seems to work in a scaling scenario where timeout occurs time to time.
I'm planning to make a PR regarding this, and please let me know if you have any thoughts.
Cheers,
Jeehwan
The text was updated successfully, but these errors were encountered: