Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percentage per pod, kill times and weekends #47

Closed

Conversation

klautcomputing
Copy link
Contributor

This is still very pseudocode-y mainly for the reason that I wanted to get some early feedback on this before I touch it up. In #34 I started some work which I think will get us nowhere, therefore this new PR with a different approach and two new features for chaoskube:

  1. I would like it if every resource could determine on its own how often it would like to get killed:
    Therefore, I propose we add the label: chaoskube.schedule with values like1/hour 3/week 2/month that let us determine how often a victim should get killed.

  2. I want to tackle Create more chaos during business hours #35 and thus I added the flags: excludeWeekends, runFrom and runUntil that let you determine when chaoskube should be active.

Let me know what you think, browse through my code and if you like the approach + the in some parts pseudo-y code I will touch it up and test it.

@linki (just because I always loose my github notifications)

@klautcomputing klautcomputing changed the title [WIP] Percentage and timeout [WIP] Percentage per pod, kill times and weekends Nov 3, 2017
@linki
Copy link
Owner

linki commented Nov 3, 2017

@klautcomputing I like 1. very much and also 2. makes sense. I'm off for a week but I'll get back to you then.

@klautcomputing
Copy link
Contributor Author

Thanks for the feedback. I hope I will find time this week to move this forward a little.

@klautcomputing klautcomputing changed the title [WIP] Percentage per pod, kill times and weekends Percentage per pod, kill times and weekends Nov 7, 2017
@klautcomputing
Copy link
Contributor Author

@linki So I guess I am done with this :)

There is one fundamental assumption that might not be obvious at first: interval can only be a positive integer of minutes. So you cannot run chaoskube more often than every minute.

I will update the readme once you give me a 👍 about merging this. I will deploy it in my cluster and play with it. And might come back with code changes.

@klautcomputing
Copy link
Contributor Author

@linki I needed to glide, no idea why so many files fell out of this and into my PR. I can remove all of :)

@klautcomputing
Copy link
Contributor Author

I ran it in my cluster today and played with it for a bit and noticed and fixed a couple of things.

Most importantly before this commit pods that didn't specified chaos.schedule were never killed. Now I added --percentage where you can set the chance for a pod to be terminated. When chaos.schedule is defined it will always have priority to --percentage. I set the default interval to 1m so that I can test more easily but I also think that's the way to go.

@klautcomputing
Copy link
Contributor Author

@linki can I ping you on this :)

@twildeboer
Copy link

@linki and @klautcomputing - I just started thinking about adding the "weekends" feature and discovered this PR. But because of the huge number of changed source files, I cannot determine what exactly is being proposed. Please have a look at my suggestion in this Feature "Issue".

@klautcomputing
Copy link
Contributor Author

@twildeboer It is about 3 files that you need to look at: main, util and util_test everything else are vendor changes.

@linki
Copy link
Owner

linki commented Dec 6, 2017

Hi @klautcomputing @twildeboer,

I cleaned up the PR so it's easier to review:

  • @klautcomputing I think you don't need to re-glide, works fine for me in this state. I usually use glide install --strip-vendor. I think you might ran into issues because you added github.com/bouk/monkey for a moment.
  • Furthermore, I extracted the newPod utility function in a separate PR and rebase this one, so we can easily see the gist of the changes.

Hope this was okay, @klautcomputing.

@linki linki force-pushed the percentage-and-timeout branch from ddd0564 to 55340b5 Compare December 6, 2017 17:10
@klautcomputing
Copy link
Contributor Author

I am not sure how to proceed with this PR because I feel we didn't reach a real conclusion with our discussion. Things that are unclear for me right now:

  1. do you want this functionality in chaoskube and if yes in which way?
  2. do we want to make chaoskube purely opt in? (this would make things way easier)
  3. --percentage could be removed
  4. all the other new flags would move from being flags to being specified by each resource through labels.

If I don't hear back from you, @linki and @twildeboer I will just make best effort and judgement changes.

Copy link

@twildeboer twildeboer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing.
IMO, the percentage feature should be a separate pull request from the "suspend chaos" feature, since they are unrelated.
Also, IMO, the "suspend chaos" feature should have a notion of timezone since I think most people would want to express these configurations in terms of their local time, regardless of the timezone of the actual pod (which would most likely be UTC).
Please have a look at my take on the implementation.

@klautcomputing
Copy link
Contributor Author

I am closing this because I don't feel I got any concrete input and I don't have the time to actually work on it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants