Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #348 : added option to trigger cluster restart on static config change #349

Closed
wants to merge 3 commits into from

Conversation

aparajita89
Copy link
Contributor

@aparajita89 aparajita89 commented Jun 17, 2021

Change log description

when static configs are changed (such as initTime, etc which go in zoo.cfg), the zookeeper cluster needs to be restarted in order for the configs to take effect. this change is to add a flag in the helm chart values to optionally trigger a cluster redeploy when static configs are modified.

Purpose of the change

Refer #348

What the code does

these changes will compute a sha256 hash of the values which go into zoo.cfg via the config map. the hash is set as the value for a custom annotation in the pod spec for the zookeeper nodes. whenever the helm chart is uprgaded, the hash will be recomputed and the annotation will be updated. this update to the annotation will trigger a pod restart for all the nodes in the zookeeper cluster.

How to verify it

  1. create a zookeeper cluster using the helm chart
  2. update the static configs in values.yaml for zookeeper chart
  3. run "helm upgrade charts/zookeeper" from the root location of the project
  4. check that the zookeeper pods are getting restarted by k8s using "kubectl get pods"

@codecov-commenter
Copy link

Codecov Report

Merging #349 (b4de734) into master (698352b) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #349   +/-   ##
=======================================
  Coverage   84.52%   84.52%           
=======================================
  Files          11       11           
  Lines        1376     1376           
=======================================
  Hits         1163     1163           
  Misses        140      140           
  Partials       73       73           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 698352b...b4de734. Read the comment docs.

@anishakj
Copy link
Contributor

anishakj commented Jun 18, 2021

@aparajita89 In my testing, observed that helm upgrade is triggering restart of pods even when restartOnConfigChange is set to false and it is coming up with the new config values. Could you please verify this behaviour?

@aparajita89
Copy link
Contributor Author

this is happening because removal of the annotation is causing pod restart with the new values set in the config. i'm checking if adding a pre-install hook to the helm chart to get the old value of the annotation and setting that is going to help fix the issue.

@anishakj
Copy link
Contributor

anishakj commented Jun 21, 2021

this is happening because removal of the annotation is causing pod restart with the new values set in the config. i'm checking if adding a pre-install hook to the helm chart to get the old value of the annotation and setting that is going to help fix the issue.

Also kubectl edit zk zookeeper and changing any of the config values is not triggering pod restart

@aparajita89
Copy link
Contributor Author

@anishakj in a cluster where the checksum/config value is already set in the annotations, if we try to update the configs by setting restartOnConfigChange to false, the value of the annotation becomes null and this triggers the pod restart. to avoid this, i tried to pull the existing value of the checksum/config annotation into a helm variable, and then to set the same value to checksum/config if restartOnConfigChange is false but to compute the checksum again if it's true.
this approach did not work because even though we can get the existing checksum/config value using a pre install hook, it is not possible to set the value of a helm variable to this value, from within the helm template file.
our requirement is to trigger a cluster restart when the static configs change and we are ok with the cluster restart every time the config changes. if this would be useful for the wider community as well then i can update my PR to make the checksum computation mandatory and remove the restartOnConfigChange flag entirely.

@aparajita89
Copy link
Contributor Author

this is happening because removal of the annotation is causing pod restart with the new values set in the config. i'm checking if adding a pre-install hook to the helm chart to get the old value of the annotation and setting that is going to help fix the issue.

Also kubectl edit zk zookeeper and changing any of the config values is not triggering pod restart

changing the config value in the CRD would not change the value of the annotation. we would need to make changes to the operator code so that it watches for changes in the static configs and updates the checksum/config annotation in the pod spec. k8s would then trigger the pod restart since the value of the annotation has now changed.

perhaps it would be more useful to abandon the idea of a cluster restart when static configs change, and to consider implementing a more generic rolling restart feature via the CRD.
how this would work:

introduce a new field "triggerRollingRestart" as part of the CRD spec
user modifies the CRD to set "triggerRollingRestart" to true
operator detects this change and restarts all the pods with the specs currently available in the CRD
operator then sets the value of "triggerRollingRestart" to false and the flow completes
this approach would work irrespective of the deployment method chosen by users (helm/manual). it would also mean that the state of the cluster is still entirely controlled via the CRD.

in case of a static config change, the CRD would be updated with the new static configs. this would not have any impact on the existing cluster. "triggerRollingRestart" can then be set to true to restart the pods on demand.

@anishakj
Copy link
Contributor

this is happening because removal of the annotation is causing pod restart with the new values set in the config. i'm checking if adding a pre-install hook to the helm chart to get the old value of the annotation and setting that is going to help fix the issue.

Also kubectl edit zk zookeeper and changing any of the config values is not triggering pod restart

changing the config value in the CRD would not change the value of the annotation. we would need to make changes to the operator code so that it watches for changes in the static configs and updates the checksum/config annotation in the pod spec. k8s would then trigger the pod restart since the value of the annotation has now changed.

perhaps it would be more useful to abandon the idea of a cluster restart when static configs change, and to consider implementing a more generic rolling restart feature via the CRD.
how this would work:

introduce a new field "triggerRollingRestart" as part of the CRD spec
user modifies the CRD to set "triggerRollingRestart" to true
operator detects this change and restarts all the pods with the specs currently available in the CRD
operator then sets the value of "triggerRollingRestart" to false and the flow completes
this approach would work irrespective of the deployment method chosen by users (helm/manual). it would also mean that the state of the cluster is still entirely controlled via the CRD.

in case of a static config change, the CRD would be updated with the new static configs. this would not have any impact on the existing cluster. "triggerRollingRestart" can then be set to true to restart the pods on demand.

Sure, This looks like a cleaner approach.

@aparajita89
Copy link
Contributor Author

will raise another PR for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants