trigger cluster redeploy when static configs are modified #348

aparajita89 · 2021-06-10T04:36:24Z

Description

when static configs are changed (such as initTime, etc which go in zoo.cfg), the zookeeper cluster needs to be restarted in order for the configs to take effect. ~~the proposal is to add a flag in the helm chart values to optionally trigger a cluster redeploy when static configs are modified.~~

[EDIT 10-Aug-21]: the finalized approach for cluster restart is as follows.
implementing a generic rolling restart feature via the CRD:

introduce a new field "triggerRollingRestart" as part of the CRD spec
user modifies the CRD to set "triggerRollingRestart" to true
operator detects this change and restarts all the pods with the specs currently available in the CRD
operator then sets the value of "triggerRollingRestart" to false and the flow completes
this approach would work irrespective of the deployment method chosen by users (helm/manual). it would also mean that the state of the cluster is still entirely controlled via the CRD.

Importance

required feature.
in case of large zookeeper clusters this would become a required feature as currently there is no way to trigger a cluster redeploy via the operator. the only way is to manually delete a pod and to wait for the operator to bring back the pod with the new configs, and then to move on the the next pod.

Location

~~changes would be required in the helm chart template zookeeper.yaml and in the corresponding values.yaml files.~~
[EDIT 10-Aug-21]: changes are needed in the operator's controller implementation and the CRD definition.

Suggestions for an improvement

a common way to mitigate this issue is to create an annotation which contains the checksum of the config map to monitor. when the config map changes, its checksum would also change and this new checksum needs to be updated in the annotation of the pod. the change in value of the annotation would trigger a recreation of the pod. this is described in detail here.
this workflow can be automated easily using helm charts. the impact on users would be a simple change in values.yaml file.
[EDIT 10-Aug-21]: when "triggerRollingRestart" is set to true in the CRD instance, operator will add an annotation to the cluster pods with the current time as the value. it then sets the value of "triggerRollingRestart" to false. this will trigger a one-time cluster restart.

aparajita89 · 2021-06-11T01:36:50Z

bump

anishakj · 2021-06-16T07:58:16Z

@aparajita89 Currently, we are working on other priority items and will be working on it later. Would you like to contribute by raising a PR?

aparajita89 · 2021-06-16T10:35:20Z

yes, i can do that. is there a process i need to follow?

aparajita89 · 2021-06-18T05:51:30Z

@anishakj i've raised the PR, can you check?
#349

anishakj · 2021-06-18T05:52:56Z

@anishakj i've raised the PR, can you check?
#349

Sure, will take a look

anishakj · 2021-07-20T15:24:01Z

@aparajita89 Any updates on the PR?

aparajita89 · 2021-07-20T16:47:38Z

@anishakj i am still working on it. sorry for the delay. i will raise a PR by the end of this week.

aparajita89 · 2021-07-26T05:04:55Z

@anishakj i've raised the PR with the changes. the testing is still in progress though. if there are any callouts, please let me know.

anishakj · 2021-07-26T05:20:06Z

@anishakj i've raised the PR with the changes. the testing is still in progress though. if there are any callouts, please let me know.

@aparajita89 thanks, will take a look

aparajita89 · 2021-07-27T10:51:22Z

@anishakj testing is completed. you can start the review.

anishakj · 2021-07-29T03:35:41Z

@aparajita89 there are some go fmt issues, could you please fix those. Also make your commits signedoff

aparajita89 · 2021-07-29T09:01:04Z

@anishakj have done that now

* trigger cluster restart via crd field Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * added test cases Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * ran go fmt Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * rolling restart e2e test cases Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * Fix containerd CVE-2021-32760 (#374) See: GHSA-c72p-9xmj-rx3w Signed-off-by: Adi Muraru <amuraru@adobe.com> Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * added unit tests Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * ran go fmt Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * ran go fmt Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * test bugfix Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * test case fix Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * test cases fix Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * fixed test case Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * ran go fmt Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * fixed tests Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * ran go fmt Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * comments Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * removed zk image Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * removed zk image Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * removed a println Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> * review comments Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com> Co-authored-by: aparajita.singh <aparajita.singh@flipkart.com> Co-authored-by: Adrian Muraru <amuraru@adobe.com>

aparajita89 mentioned this issue Jun 17, 2021

issue #348 : added option to trigger cluster restart on static config change #349

Closed

anishakj assigned aparajita89 Jun 21, 2021

anishakj added this to the Release 0.2.13 milestone Jul 20, 2021

anishakj mentioned this issue Aug 2, 2021

Issue #348: trigger cluster restart via crd field #373

Merged

anishakj closed this as completed in #373 Aug 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trigger cluster redeploy when static configs are modified #348

trigger cluster redeploy when static configs are modified #348

aparajita89 commented Jun 10, 2021 •

edited

Loading

aparajita89 commented Jun 11, 2021

anishakj commented Jun 16, 2021

aparajita89 commented Jun 16, 2021 •

edited

Loading

aparajita89 commented Jun 18, 2021 •

edited

Loading

anishakj commented Jun 18, 2021

anishakj commented Jul 20, 2021

aparajita89 commented Jul 20, 2021

aparajita89 commented Jul 26, 2021

anishakj commented Jul 26, 2021

aparajita89 commented Jul 27, 2021

anishakj commented Jul 29, 2021

aparajita89 commented Jul 29, 2021

trigger cluster redeploy when static configs are modified #348

trigger cluster redeploy when static configs are modified #348

Comments

aparajita89 commented Jun 10, 2021 • edited Loading

Description

Importance

Location

Suggestions for an improvement

aparajita89 commented Jun 11, 2021

anishakj commented Jun 16, 2021

aparajita89 commented Jun 16, 2021 • edited Loading

aparajita89 commented Jun 18, 2021 • edited Loading

anishakj commented Jun 18, 2021

anishakj commented Jul 20, 2021

aparajita89 commented Jul 20, 2021

aparajita89 commented Jul 26, 2021

anishakj commented Jul 26, 2021

aparajita89 commented Jul 27, 2021

anishakj commented Jul 29, 2021

aparajita89 commented Jul 29, 2021

aparajita89 commented Jun 10, 2021 •

edited

Loading

aparajita89 commented Jun 16, 2021 •

edited

Loading

aparajita89 commented Jun 18, 2021 •

edited

Loading