-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MinReadySeconds to DCs #7114
Comments
@smarterclayton do we have executive direction on this gap? |
I was thinking that we may want to skip this and always default to 0 (consider available when ready). |
I would like to have this - since I was the one who proposed it to Dan originally. |
ok |
@stevekuznetsov I would like to get #6233 first and then you could add this. |
@stevekuznetsov did you start working on this? If not, I would like @mfojtik to take a stab at it. |
I haven't yet, and I'm OOO until next week. @mfojtik thanks! |
@Kargakis @stevekuznetsov to make me understand this:
|
MinReadySeconds takes effect after a pod is ready. If you have dc.Spec.MinReadySeconds == 0, the deployment process should consider pods available as soon as they become ready. If you have dc.Spec.MinReadySeconds == 5, the deployment process should consider pods available 5 seconds after they become ready. I think you will need changes in the rolling updater. |
@Kargakis right, I get it now, i think. |
What's the utility of this over a intelligent readiness probe? |
This does seem less powerful and more frustrating to use than a readiness probe, anyway. |
@stevekuznetsov this doesn't replace readiness probes but complements them since it will take effect as soon as readiness probes in a pod succeed and it is considered ready. |
Why is that useful? Shouldn't we want users to define readiness probes that are actually meaningful instead of "X and then it didn't crash for five seconds"? |
Just because something is ready doesn't mean that you're ready to move on. On Thu, May 19, 2016 at 2:30 PM, Michail Kargakis notifications@github.com
|
Isn't the point of the readiness probe to tell you when you're ready to use the pod, when it's available? I think I may be misunderstanding that feature. |
Readiness probe tells you when to add it to a load balancer. Just because On Thu, May 19, 2016 at 2:41 PM, Steve Kuznetsov notifications@github.com
|
What's the use of adding it to the target pool of a load balancer if we're not sure it's booted to a stable state and able to service requests? |
Because how are you going to take requests that tax the new process until On May 19, 2016, at 7:35 PM, Steve Kuznetsov notifications@github.com What's the use of adding it to the target pool of a load balancer if we're — |
So we're dealing with a service that boots fine, and without load would signal ready & healthy forever, but might cave under load but only in the first X seconds after being ready, so we want to expose it to traffic so it has the chance of crashing? What if no traffic happens to be routed to that pod in the first X seconds and it crashes on getting the first packet after X+1 seconds? Sorry, not trying to be obtuse here, but this just seems like a very limited use case to be adding to the API. The decision making part of this PR has already happened, so you can ignore me, but when I'm back in the office I'd really appreciate some help understanding this use case. |
I can see the value of marking the deployment as compete after the pods have been ready, healthy, and have serviced requests, but a seconds time out seems like the most fragile way of doing it. Seems like what you'd really want is a Jenkins pod our or whatever running an e2e suite, but the canonical action there is to label the image as successful, not the deployment. |
I think @stevekuznetsov has convinced me that MinReadySeconds is redundant. |
It's very, very common for services to crumple under load, or to manifest Imagine the simplest possible deployment - 2 pod rolling. Dev pushes a In the meantime, the other pod is scaled up and hits the same scenario. As MinReadySeconds is necessary to allow the app author to design well written On May 20, 2016, at 6:24 AM, Steve Kuznetsov notifications@github.com So we're dealing with a service that boots fine, and without load would — |
Do we want to be failing this at the deployment level? I'm not a sophisticated user of OpenShift but I thought that a mature pipeline would look like dev->testing->prod, so these checks should be happening prior to you deploying something to a "production" environment where real traffic is being directed there. Furthermore, there is no guarantee that the deployment you're making will be taking any load in the As a consumer of OpenShift, I would be very hesitant to use this feature because of that fragility. Hell, if I threw in a oc start-build my-build
# wait for completion
sleep 100 into our Bash scripts, I'd get smacked and someone would tell me to use |
Yes, we do. On Fri, May 20, 2016 at 12:44 PM, Steve Kuznetsov notifications@github.com
|
Can't argue with that :) |
Think about this as theory vs practice. In theory, you'd have this magic On Fri, May 20, 2016 at 12:58 PM, Steve Kuznetsov notifications@github.com
|
Sure, I just didn't think we were going to be supporting a rolling deployment into production without testing as a good idea. This feature simply doesn't make sense if your rolling deployment into production happens after the success of some e2e tests labels an image as production ready - with OpenShift it should be trivial to keep your dev/test/prod deployment specs in sync and be confident that your testing will be relevant in that sense. |
Testing in dev != verification in prod. Testing can only reduce the Imagine this case. Your readiness check simulates a user login (expensive, On May 20, 2016, at 1:18 PM, Steve Kuznetsov notifications@github.com Sure, I just didn't think we were going to be supporting a rolling — |
I really hope you're not doing your thousand-user stress test in production, and especially that you're not doing it by hoping one thousand users log in in X seconds after the ready check success... As to your points about tangible differences in production environments, those make sense. We should doc the usefulness boundaries of this field and maybe even give warnings in |
Definitely the description should include a use case for why you would On Fri, May 20, 2016 at 1:47 PM, Steve Kuznetsov notifications@github.com
|
@mfojtik #6233 is merged so you can start working on this. Note that it's part of https://trello.com/c/Mljldox7/643-8-deployments-downstream-support-for-upstream-features |
UPSTREAM: kubernetes/kubernetes#28111 |
Readiness in Kubernetes deployments is dictated by MinReadySeconds. We consider pods available as soon as they become ready w/o imposing any additional restriction.
This issue is for tracking this gap.
@ironcladlou @smarterclayton @pweil-
The text was updated successfully, but these errors were encountered: