Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more details to volume scheduling design #1168

Merged
merged 1 commit into from
Nov 14, 2017

Conversation

msau42
Copy link
Member

@msau42 msau42 commented Oct 9, 2017

Done:

  • Added diagram showing new additions to the scheduler flow
  • Restrict new behavior through StorageClass API to eliminate backwards compatibility issues and no need for deprecation.
    • This also eliminates the requirement to support scheduler bypass use cases.
  • Caching overview
  • Handling other volume predicates correctly

@kubernetes/sig-storage-proposals
@kubernetes/sig-scheduling-proposals

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/design Categorizes issue or PR as related to design. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 9, 2017
@msau42
Copy link
Member Author

msau42 commented Oct 9, 2017

/assign @bsalamat @jsafrane @saad-ali

@bsalamat
Copy link
Member

bsalamat commented Oct 9, 2017

LGTM

@jsafrane
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 11, 2017
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2017
@msau42
Copy link
Member Author

msau42 commented Oct 16, 2017

Updated design to restrict the new behavior through the StorageClass API. This allows us to retain backwards compatibility and not need to deprecate anything. It also eliminates the requirement to support scheduler bypass use cases.

@msau42 msau42 force-pushed the storage-topology-design branch 2 times, most recently from e33b999 to f986a75 Compare October 16, 2017 21:58
@ianchakeres
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 19, 2017
@jsafrane
Copy link
Member

Is it still WIP? What's missing? It looks good to me.

@msau42
Copy link
Member Author

msau42 commented Oct 19, 2017

My remaining TODOs are:

  • Caching details. Does this need to be in the design doc?
  • Handling other volume predicates correctly. May need to add some extra state in the scheduler to handle the scenario where volumes can get bound in the middle of running the volume predicates.
  • How to enable feature during HA master upgrade. This can probably be deferred to beta/GA.

@msau42
Copy link
Member Author

msau42 commented Oct 19, 2017

For StorageClass API review
/assign @thockin

@msau42 msau42 force-pushed the storage-topology-design branch from f986a75 to 5164562 Compare October 20, 2017 16:44
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 20, 2017
@msau42
Copy link
Member Author

msau42 commented Oct 20, 2017

Added some information on how caching will be done, which should also take care of the other volume predicates issue. I'm going to defer HA upgrade details until beta/GA.

So this just needs another full pass + API review.

@msau42
Copy link
Member Author

msau42 commented Nov 1, 2017

Updated new binding mode value to "WaitForFirstConsumer"

@msau42 msau42 force-pushed the storage-topology-design branch from ec4f521 to 424ebdb Compare November 6, 2017 22:53
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 6, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2017
@msau42 msau42 force-pushed the storage-topology-design branch from 424ebdb to aec42ea Compare November 6, 2017 23:35
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 6, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 6, 2017
@msau42
Copy link
Member Author

msau42 commented Nov 14, 2017

@jsafrane does this look ok?

@jsafrane
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2017
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue.

@k8s-github-robot k8s-github-robot merged commit a781c12 into kubernetes:master Nov 14, 2017
jessfraz pushed a commit to jessfraz/kubernetes that referenced this pull request Nov 15, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes#54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews
sttts pushed a commit to sttts/api that referenced this pull request Nov 15, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews

Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
@@ -415,40 +436,31 @@ Kubernetes.
With all this complexity, the library approach is the most feasible in a single
release time frame, and aligns better with the current Kubernetes architecture.

#### Downsides/Problems

##### Expanding Scheduler Scope
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why were these removed? these are significant downsides

Copy link
Member Author

@msau42 msau42 Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "scope" here was referring to the scheduler having to also handle scenarios where the user manually sets Pod.Spec.NodeName to bypass the scheduler. This was because the original plan was to have this behavior change for all volume binding.

But we changed the API so that this new behavior can be controlled explicitly through StorageClass. And because of that, we decided that we don't need to support this use case of setting Pod.Spec.NodeName. It was only considered previously for backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the "expanding scheduler scope" from being just a pod scheduler to being a pod-and-volume scheduler. The downsides seem to be overlap with PV binder responsibilities, and requiring custom schedulers to also be pod-and-volume schedulers.

##### Custom Schedulers
Custom schedulers, controllers and operators that handle pod scheduling and want
to support this new volume binding mode will also need to handle the volume
binding decision.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes it clear that the volume binding is a distinct responsibility. combining it into the scheduler in the initial implementation and requiring all other schedulers to implement the same function is a strong indication that the data flow to accomplish this binding is not well separated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the simplest way to integrate volume binding with pod scheduling. When scheduling the pod, the scheduler needs information on what volumes are available for that pod. And when the scheduler makes a decision assuming that the pod is going to be using one of the volumes, it needs a way to tell the PV controller which volume it picked for that pod. Communicating that informaton to the PV controller without an API update could cause the PV controller to make binding decisions differently from the scheduler, and would cause lots of cascading, incorrect scheduling decisions, and lots of complex recovery/retry logic.

The unfortunate effect of this design is that it raises the bar for implementing your own scheduler ONLY if you want to support this advanced use case. If you don't need this feature, then there is no change.

danehans pushed a commit to danehans/kubernetes that referenced this pull request Nov 22, 2017
Automatic merge from submit-queue (batch tested with PRs 51321, 55969, 55039, 56183, 55976). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Topology aware volume scheduler and PV controller changes

**What this PR does / why we need it**:
Scheduler and PV controller changes to support volume topology aware scheduling, as specified in kubernetes/community#1168

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#54435

**Special notes for your reviewer**:
* I've split the PR into logical commits to make it easier to review
* The remaining TODOs I plan to address next release unless you think it needs to be done now

**Release note**:
```release-note
Adds alpha support for volume scheduling, which allows the scheduler to make PersistentVolume binding decisions while respecting the Pod's scheduling requirements.  Dynamic provisioning is not supported with this feature yet.

Action required for existing users of the LocalPersistentVolumes alpha feature:
* The VolumeScheduling feature gate also has to be enabled on kube-scheduler and kube-controller-manager.
* The NoVolumeNodeConflict predicate has been removed.  For non-default schedulers, update your scheduler policy.
* The CheckVolumeBinding predicate has to be enabled in non-default schedulers.
```

@kubernetes/sig-storage-pr-reviews @kubernetes/sig-scheduling-pr-reviews
sttts pushed a commit to sttts/api that referenced this pull request Nov 27, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews

Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
sttts pushed a commit to sttts/api that referenced this pull request Nov 28, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews

Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
sttts pushed a commit to sttts/api that referenced this pull request Nov 28, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews

Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
sttts pushed a commit to sttts/api that referenced this pull request Nov 28, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews

Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
sttts pushed a commit to sttts/api that referenced this pull request Nov 28, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews

Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
k8s-publishing-bot pushed a commit to k8s-publishing-bot/api that referenced this pull request Nov 29, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews

Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
k8s-publishing-bot pushed a commit to k8s-publishing-bot/api that referenced this pull request Dec 7, 2017
Automatic merge from submit-queue (batch tested with PRs 54436, 53148, 55153, 55614, 55484). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add VolumeBindingMode to StorageClass API

**What this PR does / why we need it**:
Adds a new field `VolumeBindingMode` to `StorageClass`, as specified in kubernetes/community#1168

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #54434

**Special notes for your reviewer**:
API changes only.  The scheduler and PV controller work will be submitted as a separate PR.

**Release note**:
NONE

@kubernetes/sig-storage-pr-reviews

Kubernetes-commit: 5c59d66a412ec995263f62553dfaa0b2ae33769b
MadhavJivrajani pushed a commit to MadhavJivrajani/community that referenced this pull request Nov 30, 2021
Automatic merge from submit-queue.

Add more details to volume scheduling design

Done:
* Added diagram showing new additions to the scheduler flow
* Restrict new behavior through StorageClass API to eliminate backwards compatibility issues and no need for deprecation.
    * This also eliminates the requirement to support scheduler bypass use cases.
* Caching overview
* Handling other volume predicates correctly

@kubernetes/sig-storage-proposals 
@kubernetes/sig-scheduling-proposals
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants