-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more details to volume scheduling design #1168
Add more details to volume scheduling design #1168
Conversation
LGTM |
/lgtm |
Updated design to restrict the new behavior through the StorageClass API. This allows us to retain backwards compatibility and not need to deprecate anything. It also eliminates the requirement to support scheduler bypass use cases. |
e33b999
to
f986a75
Compare
/lgtm |
Is it still WIP? What's missing? It looks good to me. |
My remaining TODOs are:
|
For StorageClass API review |
f986a75
to
5164562
Compare
Added some information on how caching will be done, which should also take care of the other volume predicates issue. I'm going to defer HA upgrade details until beta/GA. So this just needs another full pass + API review. |
Updated new binding mode value to "WaitForFirstConsumer" |
ec4f521
to
424ebdb
Compare
424ebdb
to
aec42ea
Compare
@jsafrane does this look ok? |
/lgtm |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. |
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes#54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
@@ -415,40 +436,31 @@ Kubernetes. | |||
With all this complexity, the library approach is the most feasible in a single | |||
release time frame, and aligns better with the current Kubernetes architecture. | |||
|
|||
#### Downsides/Problems | |||
|
|||
##### Expanding Scheduler Scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why were these removed? these are significant downsides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "scope" here was referring to the scheduler having to also handle scenarios where the user manually sets Pod.Spec.NodeName to bypass the scheduler. This was because the original plan was to have this behavior change for all volume binding.
But we changed the API so that this new behavior can be controlled explicitly through StorageClass. And because of that, we decided that we don't need to support this use case of setting Pod.Spec.NodeName. It was only considered previously for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the "expanding scheduler scope" from being just a pod scheduler to being a pod-and-volume scheduler. The downsides seem to be overlap with PV binder responsibilities, and requiring custom schedulers to also be pod-and-volume schedulers.
##### Custom Schedulers | ||
Custom schedulers, controllers and operators that handle pod scheduling and want | ||
to support this new volume binding mode will also need to handle the volume | ||
binding decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes it clear that the volume binding is a distinct responsibility. combining it into the scheduler in the initial implementation and requiring all other schedulers to implement the same function is a strong indication that the data flow to accomplish this binding is not well separated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the simplest way to integrate volume binding with pod scheduling. When scheduling the pod, the scheduler needs information on what volumes are available for that pod. And when the scheduler makes a decision assuming that the pod is going to be using one of the volumes, it needs a way to tell the PV controller which volume it picked for that pod. Communicating that informaton to the PV controller without an API update could cause the PV controller to make binding decisions differently from the scheduler, and would cause lots of cascading, incorrect scheduling decisions, and lots of complex recovery/retry logic.
The unfortunate effect of this design is that it raises the bar for implementing your own scheduler ONLY if you want to support this advanced use case. If you don't need this feature, then there is no change.
Automatic merge from submit-queue (batch tested with PRs 51321, 55969, 55039, 56183, 55976). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Topology aware volume scheduler and PV controller changes **What this PR does / why we need it**: Scheduler and PV controller changes to support volume topology aware scheduling, as specified in kubernetes/community#1168 **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes kubernetes#54435 **Special notes for your reviewer**: * I've split the PR into logical commits to make it easier to review * The remaining TODOs I plan to address next release unless you think it needs to be done now **Release note**: ```release-note Adds alpha support for volume scheduling, which allows the scheduler to make PersistentVolume binding decisions while respecting the Pod's scheduling requirements. Dynamic provisioning is not supported with this feature yet. Action required for existing users of the LocalPersistentVolumes alpha feature: * The VolumeScheduling feature gate also has to be enabled on kube-scheduler and kube-controller-manager. * The NoVolumeNodeConflict predicate has been removed. For non-default schedulers, update your scheduler policy. * The CheckVolumeBinding predicate has to be enabled in non-default schedulers. ``` @kubernetes/sig-storage-pr-reviews @kubernetes/sig-scheduling-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add VolumeBindingMode to StorageClass API **What this PR does / why we need it**: Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434 **Special notes for your reviewer**: API changes only. The scheduler and PV controller work will be submitted as a separate PR. **Release note**: NONE @kubernetes/sig-storage-pr-reviews Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
Automatic merge from submit-queue. Add more details to volume scheduling design Done: * Added diagram showing new additions to the scheduler flow * Restrict new behavior through StorageClass API to eliminate backwards compatibility issues and no need for deprecation. * This also eliminates the requirement to support scheduler bypass use cases. * Caching overview * Handling other volume predicates correctly @kubernetes/sig-storage-proposals @kubernetes/sig-scheduling-proposals
Done:
@kubernetes/sig-storage-proposals
@kubernetes/sig-scheduling-proposals