Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: unique volume per allocation #10136

Merged
merged 6 commits into from
Mar 18, 2021
Merged

CSI: unique volume per allocation #10136

merged 6 commits into from
Mar 18, 2021

Conversation

tgross
Copy link
Member

@tgross tgross commented Mar 8, 2021

Fixes #7877

Adds a PerAlloc field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. [0]), rather than the exact source ID. This suffix is also being added at the client when the volume claim RPCs are sent.

Reviewer notes:

@tgross tgross added this to the 1.1.0 milestone Mar 8, 2021
@tgross tgross force-pushed the csi-unique-volumes-per-alloc branch from c257b9b to dd628be Compare March 8, 2021 22:08
@vercel vercel bot temporarily deployed to Preview – nomad March 8, 2021 22:08 Inactive
@tgross tgross force-pushed the csi-unique-volumes-per-alloc branch from dd628be to a90d367 Compare March 10, 2021 16:34
@vercel vercel bot temporarily deployed to Preview – nomad March 10, 2021 16:34 Inactive
@tgross tgross force-pushed the csi-unique-volumes-per-alloc branch from a90d367 to 3b7e58e Compare March 10, 2021 20:23
@vercel vercel bot temporarily deployed to Preview – nomad March 10, 2021 20:23 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad March 10, 2021 20:29 Inactive
@tgross tgross force-pushed the csi-unique-volumes-per-alloc branch from 2d04b6c to 2d2118b Compare March 10, 2021 20:56
@vercel vercel bot temporarily deployed to Preview – nomad March 10, 2021 20:56 Inactive
@tgross tgross force-pushed the csi-unique-volumes-per-alloc branch from 2d2118b to d3820d6 Compare March 10, 2021 21:21
@vercel vercel bot temporarily deployed to Preview – nomad March 10, 2021 21:21 Inactive
@tgross tgross force-pushed the csi-unique-volumes-per-alloc branch from d3820d6 to cebcb23 Compare March 10, 2021 21:24
@vercel vercel bot temporarily deployed to Preview – nomad March 10, 2021 21:24 Inactive
@tgross tgross self-assigned this Mar 10, 2021
@vercel vercel bot temporarily deployed to Preview – nomad March 11, 2021 13:35 Inactive
@tgross tgross force-pushed the csi-unique-volumes-per-alloc branch from c80c665 to d77fd59 Compare March 11, 2021 15:11
@vercel vercel bot temporarily deployed to Preview – nomad March 11, 2021 15:11 Inactive
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had a couple of questions after skimming the PR. Will review with closer attention later today.

@@ -92,8 +92,13 @@ func (c *csiHook) Postrun() error {
mode = structs.CSIVolumeClaimWrite
}

source := pair.request.Source
if pair.request.PerAlloc {
source = source + structs.AllocSuffix(c.alloc.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall an issue where alloc indexes aren't unique, and we may run into two allocations sharing the same id in cases of canaries where multiple deployment versions are running. Would that cause an issue here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would! But it's one of our invariants that if you're running a job with PerAlloc you can't also use canaries. (It doesn't make sense as a concept with volumes.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make a note of that and/or add a test for that. I fear one day, we relax or change the requirement a bit and miss this assumption in the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll definitely add some more commentary to make that clear though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized I totally missed the docs too!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3b1d19c

if obj == nil {
return nil, nil
}
vol, ok := obj.(*structs.CSIVolume)
if !ok {
return nil, fmt.Errorf("volume row conversion error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a violated invariant, potentially a data corruption issue. Not sure what should happen here. It may be nice to log and include the obj type as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the rest of the state store code actually just tosses this conversion error out and lets us panic later. That's probably the right move.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4ae9cb8

Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.
Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] Ability to use variable interpolation in volume {} stanza
3 participants