Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check volume binding mode #479

Merged
merged 15 commits into from
May 15, 2019
Merged

Check volume binding mode #479

merged 15 commits into from
May 15, 2019

Conversation

dturn
Copy link
Contributor

@dturn dturn commented May 1, 2019

If the StorageClass for a PVC has 'volumeBindingMode: WaitForFirstConsumer',
then it won't bind until after a Pod mounts it. But the PVC must be pre-deployed,
as the Pod requires it. So 'Pending' must be treated as a 'Success' state in this
scenario.

What are you trying to accomplish with this PR?
Automatic handling of the 'volumeBindingMode: WaitForFirstConsumer' condition in the StorageClass of a PVC resource.

How is this accomplished?
By determining the StorageClass of the PVC resource and finding the volumeBindingMode value. If it is WaitForFirstConsumer, then a PVC status of 'Pending' is treated as a success during deployment.

What could go wrong?
The code currently assumes the DefaultStorageClass admission plugin is turned on, and looks for a default StorageClass if one is not present in the PVC definition. There doesn't appear to be a way to determine if this admission plugin is turned on programmatically.

The code also checks for two other conditions and provides a warning:

if there are multiple Default StorageClasses defined; or
if the PVCs defined StorageClass is not found

@redmcg . When I pushed it went to origin instead of your repo. This still isn't done, needs better error messages, a test or two to catch some of the failure cases, and the test needs to clean up the storage class since its not namespaced. But thought I'd get your eyes early

@redmcg
Copy link

redmcg commented May 1, 2019

This looks much much better. My only concern was with using hostPath instead of local in the PV - as when reading this:
https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode

it sounded like volumeBindingMode: WaitForFirstConsumer wasn't supported by hostPath. But I did a couple of tests and it seems to be supported. So we can get rid of that initContainer too.

How do you want to proceed with the PR? Am I able to contribute to this branch? And should I delete my own PR?

@dturn
Copy link
Contributor Author

dturn commented May 1, 2019

How do you want to proceed with the PR? Am I able to contribute to this branch? And should I delete my own PR?

If its ok with you, using this branch is easier for us (makes running CI simpler). You should be able to push to this PR, just pull my changes into your branch and we'll make it work. It would be great if you could make the test changes you suggested!

@redmcg
Copy link

redmcg commented May 1, 2019

I couldn't push directly to this branch - I get:
Permission to Shopify/kubernetes-deploy.git denied to redmcg.

But I can create a new pull request against it. That does mean you'll have a PR against a PR though. I'll try that - but will of course close it if you prefer a different approach.

Edit: I've opened #480. With this I guess you can review each commit before accepting it in to this PR. Alternatively I noticed that you should be able to contribute to any branch with which I create a PR (i.e. by pushing to redmcg/kubernetes-deploy.git) - but that doesn't address making CI simpler.

redmcg and others added 2 commits May 3, 2019 09:59
* Remove redundant initContainer and volume

* Delete storageclass on test completion

* Assert all four resources finished in the correct state

* Add additional tests

* Fix line length to satisfy Policial
@dturn dturn requested review from timothysmith0609 and KnVerey and removed request for timothysmith0609 May 3, 2019 17:30
@dturn dturn self-assigned this May 3, 2019
Copy link
Contributor

@timothysmith0609 timothysmith0609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me, few minor style comments

@dturn dturn requested review from RyanBrushett and timothysmith0609 and removed request for RyanBrushett May 13, 2019 17:01
@dturn dturn requested a review from RyanBrushett May 13, 2019 17:01
Copy link
Contributor

@timothysmith0609 timothysmith0609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last question about failure_message but everything else is looking good for me.

if storage_class_name.nil? && @storage_classes.count(&:default?) > 1
"PVC has no StorageClass specified and there are multiple StorageClasses " \
"annotated as default. This is an invalid cluster configuration."
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an else case to consider? If it just returns nil will that hide worthwhile information from the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else case is status == "Lost" which didn't have any message before. I'm also not sure what we would helpfully say in the lost case.

Copy link
Contributor

@RyanBrushett RyanBrushett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@dturn dturn merged commit 0f55c66 into master May 15, 2019
@dturn dturn deleted the check_volumeBindingMode branch May 15, 2019 15:22
@redmcg
Copy link

redmcg commented May 15, 2019

Thanks all for your work on this

@storage_classes.detect { |sc| sc.name == storage_class_name }
# storage_class_name = "" is an explicit request for no storage class
# storage_class_name = nil is an impplicit request for default storage class
elsif storage_class_name != ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this more specifically storage_class_name.nil??


assert_logs_match_all([
"Failed to deploy 1 priority resource",
"PersistentVolumeClaim/with-storage-class: TIMED OUT (timeout: 20s)",
Copy link
Contributor

@KnVerey KnVerey May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test taking a full 20s? Can we use a custom timeout to cut that down? Same question for one of the tests above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eye, it does :(. Should be able to decrease it, just haven't tried yet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already using a custom timeout via a kubernetes-deploy.shopify.io/timeout-override:
https://github.com/Shopify/kubernetes-deploy/blob/ef17097b42b5875e15c138b74f7a9eb80485af9c/test/fixtures/pvc/pvc.yml#L5

The default was 300s - so I shortened it to 20s. I chose 20s to be sure it wasn't so short as to create a false positive - but 20s is still pretty generous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants