Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trigger cluster redeploy when static configs are modified #348

Closed
aparajita89 opened this issue Jun 10, 2021 · 12 comments · Fixed by #373
Closed

trigger cluster redeploy when static configs are modified #348

aparajita89 opened this issue Jun 10, 2021 · 12 comments · Fixed by #373
Assignees

Comments

@aparajita89
Copy link
Contributor

aparajita89 commented Jun 10, 2021

Description

when static configs are changed (such as initTime, etc which go in zoo.cfg), the zookeeper cluster needs to be restarted in order for the configs to take effect. the proposal is to add a flag in the helm chart values to optionally trigger a cluster redeploy when static configs are modified.

[EDIT 10-Aug-21]: the finalized approach for cluster restart is as follows.
implementing a generic rolling restart feature via the CRD:

  • introduce a new field "triggerRollingRestart" as part of the CRD spec
  • user modifies the CRD to set "triggerRollingRestart" to true
  • operator detects this change and restarts all the pods with the specs currently available in the CRD
  • operator then sets the value of "triggerRollingRestart" to false and the flow completes
    this approach would work irrespective of the deployment method chosen by users (helm/manual). it would also mean that the state of the cluster is still entirely controlled via the CRD.

Importance

required feature.
in case of large zookeeper clusters this would become a required feature as currently there is no way to trigger a cluster redeploy via the operator. the only way is to manually delete a pod and to wait for the operator to bring back the pod with the new configs, and then to move on the the next pod.

Location

changes would be required in the helm chart template zookeeper.yaml and in the corresponding values.yaml files.
[EDIT 10-Aug-21]: changes are needed in the operator's controller implementation and the CRD definition.

Suggestions for an improvement

a common way to mitigate this issue is to create an annotation which contains the checksum of the config map to monitor. when the config map changes, its checksum would also change and this new checksum needs to be updated in the annotation of the pod. the change in value of the annotation would trigger a recreation of the pod. this is described in detail here.
this workflow can be automated easily using helm charts. the impact on users would be a simple change in values.yaml file.

[EDIT 10-Aug-21]: when "triggerRollingRestart" is set to true in the CRD instance, operator will add an annotation to the cluster pods with the current time as the value. it then sets the value of "triggerRollingRestart" to false. this will trigger a one-time cluster restart.

@aparajita89
Copy link
Contributor Author

bump

@anishakj
Copy link
Contributor

@aparajita89 Currently, we are working on other priority items and will be working on it later. Would you like to contribute by raising a PR?

@aparajita89
Copy link
Contributor Author

aparajita89 commented Jun 16, 2021

yes, i can do that. is there a process i need to follow?

@aparajita89
Copy link
Contributor Author

aparajita89 commented Jun 18, 2021

@anishakj i've raised the PR, can you check?
#349

@anishakj
Copy link
Contributor

@anishakj i've raised the PR, can you check?
#349

Sure, will take a look

@anishakj anishakj added this to the Release 0.2.13 milestone Jul 20, 2021
@anishakj
Copy link
Contributor

@aparajita89 Any updates on the PR?

@aparajita89
Copy link
Contributor Author

@anishakj i am still working on it. sorry for the delay. i will raise a PR by the end of this week.

@aparajita89
Copy link
Contributor Author

@anishakj i've raised the PR with the changes. the testing is still in progress though. if there are any callouts, please let me know.

@anishakj
Copy link
Contributor

@anishakj i've raised the PR with the changes. the testing is still in progress though. if there are any callouts, please let me know.

@aparajita89 thanks, will take a look

@aparajita89
Copy link
Contributor Author

@anishakj testing is completed. you can start the review.

@anishakj
Copy link
Contributor

@aparajita89 there are some go fmt issues, could you please fix those. Also make your commits signedoff

@aparajita89
Copy link
Contributor Author

@anishakj have done that now

anishakj pushed a commit that referenced this issue Aug 13, 2021
* trigger cluster restart via crd field

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* added test cases

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* ran go fmt

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* rolling restart e2e test cases

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* Fix containerd CVE-2021-32760 (#374)

See: GHSA-c72p-9xmj-rx3w
Signed-off-by: Adi Muraru <amuraru@adobe.com>
Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* added unit tests

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* ran go fmt

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* ran go fmt

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* test bugfix

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* test case fix

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* test cases fix

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* fixed test case

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* ran go fmt

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* fixed tests

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* ran go fmt

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* comments

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* removed zk image

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* removed zk image

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* removed a println

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

* review comments

Signed-off-by: aparajita.singh <aparajita.singh@flipkart.com>

Co-authored-by: aparajita.singh <aparajita.singh@flipkart.com>
Co-authored-by: Adrian Muraru <amuraru@adobe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants