Brigade Worker on Transient Fault Handling #625

jeehwancho · 2018-09-10T14:33:38Z

As part of evaluating autoscaling, I found out that workers (the calls from kubernetes-client, specifically) don't handle transient faults, which occur when a cluster is being scaled.
While the transient faults caused by the AKS's autoscale has its own problem to be resolved from their end (Azure/AKS#381), I think a client like brigade workers should handle such problems regardless.
According to kubernetes-client, transient faults are left for user to handle at this moment. (kubernetes-client/javascript#96).
So I updated k8s.ts with the exponential backoff retry, which seems to work in a scaling scenario where timeout occurs time to time.
I'm planning to make a PR regarding this, and please let me know if you have any thoughts.

Cheers,
Jeehwan

jeehwancho · 2018-09-10T14:47:01Z

Just created a PR. It'd be appreciated if anyone could review it.

* fix(worker): handle transient faults (#625) fix(worker): handle JSON.parse exception caused by transient faults in watch feat(worker): support KUBECONFIG (feedback from #596) * Specify types for pod and request in worker (#626) Make sure watch stream stay connected

jeehwancho · 2018-10-04T12:51:30Z

close via #626

jeehwancho mentioned this issue Sep 10, 2018

fix(worker): handle transient faults (#625) #626

Merged

jeehwancho closed this as completed Oct 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brigade Worker on Transient Fault Handling #625

Brigade Worker on Transient Fault Handling #625

jeehwancho commented Sep 10, 2018

jeehwancho commented Sep 10, 2018

jeehwancho commented Oct 4, 2018

Brigade Worker on Transient Fault Handling #625

Brigade Worker on Transient Fault Handling #625

Comments

jeehwancho commented Sep 10, 2018

jeehwancho commented Sep 10, 2018

jeehwancho commented Oct 4, 2018