-
Notifications
You must be signed in to change notification settings - Fork 39.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support addon Deployments, make heapster a deployment with a nanny. #22893
Conversation
Q-Lee
commented
Mar 12, 2016
- Heapster uses a deployment here.
- Changed kube-addon-update.sh to include deployments (this seems like a transitional change while we're moving all addons to deployments).
- Adding a nanny to resize heapster with the number of nodes.
Labelling this PR as size/L |
GCE e2e build/test failed for commit 5a3c51892cf210760ba4cbbc40d821e5e7c97b82. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
Is this for 1.2? |
No, this can't go into 1.2. That ship has sailed. |
We're aiming for 1.2.1. Adding zml@ for context on kube-addon-update.sh. |
@@ -475,6 +475,7 @@ function update-addons() { | |||
local -r addon_path=$1 | |||
# be careful, reconcile-objects uses global variables | |||
reconcile-objects ${addon_path} ReplicationController "-" & | |||
reconcile-objects ${addon_path} Deployment "-" & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment bellow this:
We don't expect names to be versioned for the following kinds
I would expect this to be true for deployments as well.
@bgrant0607 Is there anything in particular I need to consider for cluster upgrades if heapster is becoming a deployment? |
GCE e2e build/test failed for commit 82682f5d993c99f33e4d104675fbbb9cb864e1a2. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
What's going to kill off the old RCs? |
- image: gcr.io/google_containers/addon-resizer:1.0 | ||
name: heapster-nanny | ||
resources: | ||
limits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you just specify limits, request should be set from limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#18216 added requests to all addons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the config best practices is:
Don’t specify default values unnecessarily, in order to simplify and minimize configs, and to reduce error.
I don't really understand the argument behind #18216 but it seems to violate that best practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why don't we do this for other singletons (e.g., heapster itself)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of #18216. Can you explain why this is important @gmarek? Here's the defaulting code https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/defaults.go#L99-L116
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we were designing QoS we decided to have this defaulting as a backward compatibility feature. I didn't had impression that it's there to stay forever, so I think we should have requests explicit. Generally all those QoS stuff is really confusing as it is today. @piosz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting: I didn't know about that default behavior. It actually conflicts with the addon-resizer, which explicitly sets requests=limits to keep its dependents in the guaranteed class. The addon-resizer also performs an update on any qualitative difference (e.g., requests being expected but not found). This would manifest as a single deployment update at startup.
It feels weird to add a flag to disregard requests, but we'll need to if we want to utilize this default behavior. But since literally all of our addons specify both requests and limits, why don't I cut an issue for 1.3 to rectify both the pod_nanny and all of our addon yamls with best practices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another proof that the current state is confusing. We should work on that at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #23229
@@ -17,7 +17,8 @@ metadata: | |||
spec: | |||
replicas: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this only has 1 replica, you might consider the Recreate update policy. The default is RollingUpdate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea. The code for the container in question lives in contrib/addon-resizer, where that would be an easy (1-line) change. I'll definitely consider it.
fieldPath: metadata.namespace | ||
command: | ||
- /pod_nanny | ||
- --cpu=100m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mwielgus Please verify these values based on your perf tests. AFAIR we need more than 100m cpu in large clusters and 4MB of memory for each node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1.2 cut gives 4MB per node, but leaves CPU at 100m. I've changed this to use the same per-node value as static sizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 1000 node cluster we are using 50 millis on average with occasional spikes to 500 (every minute for a couple seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this define?
@fgrzadkowski how do I know which person/people own a particular configuration file? |
"What's going to kill off the old RCs?" Ideally I would think the addon-update system should delete the old replication controller when its configuration is no longer found. Thoughts? |
GCE e2e build/test failed for commit 1c794555072a5ddd5ef3a26fc5970e6f04ec1c26. Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake. |
@Q-Lee: |
GCE e2e build/test passed for commit 726bbc4bfce110601fc17bf568386e413b4bd3de. |
PR changed after LGTM, removing LGTM. |
GCE e2e build/test passed for commit 81ba98a. |
Support addon Deployments, make heapster a deployment with a nanny.
@@ -481,6 +481,7 @@ function update-addons() { | |||
# That's why we pass an empty string as the version separator. | |||
# If the description differs on disk, the object should be recreated. | |||
# This is not implemented in this version. | |||
reconcile-objects ${addon_path} Deployment "" & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this going to update Deployments? AFAIU name for an object will not change, so how addon-updater will realise it has to do an update? Has this been tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @Q-Lee
Are you sure this won't broke addons update procedure?
Removing label |
Commit found in the "release-1.2" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
@Q-Lee as I said in #23434 (comment), "Please notify me when you decide to cherry pick." Your change broke us in master branch and I fixed that. Now it was cherry picked in release-1.2 without telling us, so it breaks us again. Thanks to our Jenkins for catching the breakage. Hopefully I will still have time to cherry pick my fix into release-1.2 before 1.2.2 is cut. |
…ck-of-#22893-kubernetes#23512-upstream-release-1.2 Automated cherry pick of kubernetes#22893 kubernetes#23512
…ck-of-#22893-kubernetes#23512-upstream-release-1.2 Automated cherry pick of kubernetes#22893 kubernetes#23512