Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart console container when config changes #18411

Merged
merged 1 commit into from
Feb 7, 2018

Conversation

spadgett
Copy link
Member

@spadgett spadgett commented Feb 2, 2018

Opening this to get feedback. We have a problem where there is no good way to rollout the console after editing the console config in its config map. Right now we have to tell users to delete the console pods, which is error prone and not friendly. The console only reads config at startup and doesn't watch for changes.

This adds a liveness probe that detects if the config has changed on the filesystem using an md5 hash. If the config changes, the liveness probe fails, and the container restarts. It'a similar to what @aweiteka has done for prometheus config changes. It's a bit of a hack, but it works.

@sdodson This would simplify the install because we'd no longer need to force a console rollout on config changes from the metrics and logging playbooks.

Any objections to this approach?

/assign @jwforres
/cc @smarterclayton @derekwaynecarr @deads2k
/hold

Holding for feedback :)

@jupierce fyi

Use a livenessProbe to detect when the console config has changed.
@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 2, 2018
@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 2, 2018
@spadgett spadgett changed the title Restart console pod when config changes Restart console container when config changes Feb 2, 2018
@deads2k
Copy link
Contributor

deads2k commented Feb 2, 2018

@derekwaynecarr @sjenning If there was a way to directly restart a pod, we could write a controller that restarted pods that opt-in via an annotation. Restarting them based on whether the content of a secret or configmap they mounted changed. The logic is pretty simple.

The same API endpoint could be used to drive restarts for static pods. What are my chances of getting such an API?

@deads2k
Copy link
Contributor

deads2k commented Feb 2, 2018

You probably want some kind of a jitter or randomness to avoid killing all your pods at once, right?

@spadgett
Copy link
Member Author

spadgett commented Feb 2, 2018

Just a quick note that this is really meant to be a stop-gap solution until we have something better.

Also it looks like the md5sum command is not expensive at all.

sh-4.2$ time md5sum /var/webconsole-config/webconsole-config.yaml
0c2f2cc813a6c755515dcf46b2d8e722  /var/webconsole-config/webconsole-config.yaml

real    0m0.001s
user    0m0.000s
sys     0m0.000s

@spadgett
Copy link
Member Author

spadgett commented Feb 2, 2018

You probably want some kind of a jitter or randomness to avoid killing all your pods at once, right?

Yeah. I thought there might be some jitter built-in since the liveness probes and config map updates won't happen at the same time for all pods, but maybe that's not the case.

@sdodson
Copy link
Member

sdodson commented Feb 2, 2018

You probably want some kind of a jitter or randomness to avoid killing all your pods at once, right?

Don't liveness probes already have a jitter?

@sjenning
Copy link
Contributor

sjenning commented Feb 2, 2018

Why is a rollout not desirable?

That is the official way to restart the pods in a deploymentconfig and respects all the policies we have in place about how many pods can be down at once. I agree that it is overkill, but if the application is not going to inotify watch the configmap and reload on its own, then this is other official way to pick up the change.

There really is no API to the kubelet to request a pod restart. The kubelet itself has internal mechanism for doing this, such as when liveness/readiness probes fail, but there is no way for external controllers to request this. Said another way, there is no property of the pod spec that would indicate that intention, like setting the deletionTimestamp indicates the kubelet should kill the pod.

Failing the liveness probe when the configmap changes is a clever (ab)use of the mechanism, I must say :P

@spadgett
Copy link
Member Author

spadgett commented Feb 2, 2018

Why is a rollout not desirable?

I wish we could rollout. The problem is this is an k8s deployment, not a deployment config. There's no command to rollout a deployment again if the pod spec hasn't changed. So it'e either:

  1. Delete the pods, or
  2. Add some annotation to the deployment pod spec to trigger a rollout

It doesn't feel great to ask users to do either of those things after editing console config.

Failing the liveness probe when the configmap changes is a clever (ab)use of the mechanism, I must say :P

Credit to @aweiteka :)

@sjenning
Copy link
Contributor

sjenning commented Feb 5, 2018

I wish we could rollout. The problem is this is an k8s deployment, not a deployment config.

Any reason not to use a DeploymentConfig instead of a Deployment?

@spadgett
Copy link
Member Author

spadgett commented Feb 5, 2018

Any reason not to use a DeploymentConfig instead of a Deployment?

To avoid needing to migrate to a Deployment later on.

@spadgett
Copy link
Member Author

spadgett commented Feb 6, 2018

@smarterclayton Any concerns with this change?

We'd like to go ahead with it unless anyone objects.

@smarterclayton
Copy link
Contributor

Do it

@aweiteka
Copy link
Contributor

aweiteka commented Feb 6, 2018

Similar pattern here: #18391

@spadgett
Copy link
Member Author

spadgett commented Feb 6, 2018

/hold cancel

@jwforres PTAL

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 6, 2018
@jwforres
Copy link
Member

jwforres commented Feb 6, 2018

/lgtm

but I dont think i have approver rights here

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2018
@jwforres
Copy link
Member

jwforres commented Feb 6, 2018

@deads2k or @smarterclayton would you mind approving

@deads2k
Copy link
Contributor

deads2k commented Feb 6, 2018

/approve

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jwforres, spadgett

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2018
@spadgett
Copy link
Member Author

spadgett commented Feb 6, 2018

/retest

@spadgett
Copy link
Member Author

spadgett commented Feb 6, 2018

/retest

@sdodson
Copy link
Member

sdodson commented Feb 7, 2018

/test gcp

@spadgett
Copy link
Member Author

spadgett commented Feb 7, 2018

/retest

@spadgett
Copy link
Member Author

spadgett commented Feb 7, 2018

flake #18136

/test extended_conformance_install

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit d7677ca into openshift:master Feb 7, 2018
@spadgett spadgett deleted the console-liveness branch February 8, 2018 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants