-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reboot only within time window specified on commandline #66
Conversation
👍 |
I've been picking at this a bit since opening, but I think that should be everything. Open to code review feedback. |
It would be really great if you added days to exclude reboots on, such as what chaoskube supports, see: https://github.com/linki/chaoskube |
@patrickmslatteryvt It's not a bad idea, but I'd prefer to put that into a separate PR at this point. My team probably wouldn't make use of it. I'll see if I can adopt this into my change when I get a few more free cycles. |
@awh can you review this please ? |
Is there any chance to have this merged? It would be incredibly useful! |
@unsafecode i applied this patch on our test-cluster today. I cloned everything into $GOPATH and pushed an image to our private registry. I also rebased the pull request onto the tag 1.2.0 to make sure I don't have too many changes. Here are the Steps I did:
I would also like this to be in the next release. |
bump? @awh |
I would merge this scheduling feature -- there a bunch of reasons you wouldn't want your cluster churning when nobody is around. You can't anticipate something like the Monzo etcd/Linkerd outage. It's best if that doesn't wake up your on-call at 2am. If my apps are going down when kured is rebooting Nodes, paging an SRE in the middle of the night isn't going to encourage fixing anything. A team with high operational toil will just choose to stop running kured. Its better to be able to page the dev-team during the day when everybody is awake. |
When merged with cd github.com/weaveworks/kured
hub checkout https://github.com/weaveworks/kured/pull/66
git checkout master
git merge git merge jjjordanmsft-master
dep ensure
make cmd/kured/kured go test ./... -v -count 1
# github.com/weaveworks/kured/cmd/prom-active-alerts
cmd/prom-active-alerts/main.go:17:16: undefined: alerts.PrometheusCountActive
? github.com/weaveworks/kured/cmd/kured [no test files]
=== RUN TestParseWeekdays
--- PASS: TestParseWeekdays (0.00s)
=== RUN TestParseWeekdaysErrors
--- PASS: TestParseWeekdaysErrors (0.00s)
=== RUN TestTimeWindows
--- PASS: TestTimeWindows (0.00s)
PASS
ok github.com/weaveworks/kured/pkg/timewindow 0.001s
|
Can we merge this if all checks are passing? This is a real need for my team and I prefer to use master builds for stable releases, rather than one-off builds from random branches. |
@tkaepp appears to have successfully built and used this on a real cluster. @Thraganux, can you confirm that you've built this and it's working as intended? |
I'm merging this so we get a master CI build -- @Thraganux could you check that it works with and without the feature enabled? Cheers @jjjordanmsft Thank you so much for this high-quality patch. |
@stealthybox for transparency the trio of @jjjordanmsft @Thraganux and I are colleagues :) I rebuilt a fresh image off 4d2b876 (previously based on 1.2.0) and have been soaking that in several of our clusters for the last ~week with the feature enabled. I'm happy to test the same with the CI image (pending circleci issues?). It's been pretty uneventful, fortunately. |
Thanks so much @alexeldeib 🎊 This new feature's behavior is likely superior for most user's clusters, but I'm just looking for an anecdote that the old behavior still works. |
Oops, CI failed to push the build: |
Overrode CI /w my personal credentials (momentarily) The We'll figure out why the dedicated CI docker account is being denied access soon. |
We aren't now, but I can take the new image and test with and without. Not likely to get to it today at the rate I'm going, but tomorrow I'll grab the latest CI image and (de)configure the feature in a few places. |
FYI, I rolled out a small test (~25 node clusters, 1 each with this feature on/off) based on that tag and things have been looking good 👍 will be promoting the same image to more envs, but not seeing any cause for concern. |
Brilliant -- that's as good of a user report as we could have asked for. |
Is there a particular release or version that this is planned for? Would love to get this functionality into use. :) |
@easkay-castlight At this point in time, I recommend using Our CI infra needs some work and we(Weaveworks) do not have much pre-KubeCon bandwidth. We would love and welcome users volunteering to maintain |
@stealthybox Thanks for the info, that's really helpful. I would be interested in helping out for sure, not quite sure how/where to get started. Will take a look later this evening (GMT). |
@alexeldeib I'm doing something silly, can you sanity-check this? I'm using tag
Example pod template snippet: command:
- /usr/bin/kured
- --ds-name=kured
- --ds-namespace=kured
- --end-time "05:00:00"
- --reboot-days wed
- --start-time 3am
- --time-zone Europe/London You've been using that tag for a while, what am I doing wrong? This is the simplest config I've tried:- command:
- /usr/bin/kured
- --ds-name=kured
- --ds-namespace=kured
- --start-time 03:00:00 |
@alexeldeib Sorry ignore me. I've realised where I'm being an idiot (I added the new params without an equals sign in the above snippets). |
Similar to #20 but allows us to specify, e.g., business hours so we don't get awoken in the middle of the night :-).