Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad should not create the target_path passed into NodePublishVolume #8358

Closed
timoreimann opened this issue Jul 4, 2020 · 14 comments · Fixed by #8505
Closed

Nomad should not create the target_path passed into NodePublishVolume #8358

timoreimann opened this issue Jul 4, 2020 · 14 comments · Fixed by #8505
Assignees
Milestone

Comments

@timoreimann
Copy link

👋

Nomad version

Unclear as I am filing this issue on behalf of a user running Nomad on DigitalOcean infrastructure.

Operating system and Environment details

OS: most likely Linux
Environment: DigitalOcean VMs (aka Droplets)

Issue

A user of DigitalOcean’s CSI driver (which I happen to maintain) opened a support ticket with DigitalOcean saying that they could not use volumes managed by a self-hosted Kubernetes cluster running on DigitalOcean infrastructure. They used DigitalOcean’s CSI driver in version v1.3.0 and saw the following error when our NodePublishVolume implementation was invoked (pardon the output format -- we employ structured logging in our driver):

time="2020-06-24T16:59:54Z"
level=info
msg="mounting the volume"
host_id=<REDACTED>
method=node_publish_volume
mount_options="[bind]"
region=fra1
source_path=/dev/sda
staging_target_path=/csi/staging/redis/rw-block-device-single-node-writer
target_path=/csi/per-alloc/7dd9b6f1-1626-cbf1-a917-a505ec7af1b2/redis/rw-block-device-single-node-writer
version=latest
volume_id=<REDACTED>
volume_mode=block

time="2020-06-24T16:59:54Z"
level=error
msg="method failed"
error="rpc error: code = Internal desc = failed to create target file for raw block bind mount: open /csi/per-alloc/7dd9b6f1-1626-cbf1-a917-a505ec7af1b2/redis/rw-block-device-single-node-writer: is a directory"
host_id=197333792
method=/csi.v1.Node/NodePublishVolume
region=fra1
version=latest

The most important part is the error message:

rpc error: code = Internal desc = failed to create target file for raw block bind mount: open /csi/per-alloc/7dd9b6f1-1626-cbf1-a917-a505ec7af1b2/redis/rw-block-device-single-node-writer: is a directory

What is happening here is that the driver was asked to use a volume in raw block mode. The way this is implemented in our driver is that we create a file under the target path and afterwards bind-mount the device into it. The former step fails with Nomad as the CO (container orchestrator) because the target path already exists as a directory.

Here’s what the spec has to say about the target_path parameter on NodePublishVolume:

  // The path to which the volume will be published. It MUST be an
  // absolute path in the root filesystem of the process serving this
  // request. The CO SHALL ensure uniqueness of target_path per volume.
  // The CO SHALL ensure that the parent directory of this path exists
  // and that the process serving the request has `read` and `write`
  // permissions to that parent directory.
  // For volumes with an access type of block, the SP SHALL place the
  // block device at target_path.
  // For volumes with an access type of mount, the SP SHALL place the
  // mounted directory at target_path.
  // Creation of target_path is the responsibility of the SP.
  // This is a REQUIRED field.
  string target_path = 4;

Note these specific parts of the description:

  // The CO SHALL ensure that the parent directory of this path exists [...].
  // Creation of target_path is the responsibility of the SP.

I reached out to the SIG Storage folks on the Kubernetes Slack (who are involved with the spec work) to confirmed that the spec in this regard should be interpreted as “the CO MUST NOT create the target_path”. So it looks like Nomad might be doing too much here.

I briefly considered changing our CSI driver to address this scenario by deleting and recreating the target path. However, I felt somewhat hesitant on this move as it seems a bit invasive / disruptive to me and could mess with the expectations that the CO may have. Overall, I am more inclined to resolving the matter in a fashion that can be considered most compliant with the spec.

That said, I'm more than happy to discuss options if you feel this should be tackled in any particular way.

Thanks!

Reproduction steps

Unfortunately I'm not familiar enough with Nomad to reproduce things on my own. My guess is that the problem should be reproducible though by creating a Nomad cluster on DigitalOcean infrastructure and trying to use a volume in block access mode.

@tgross
Copy link
Member

tgross commented Jul 6, 2020

Thanks for opening this @timoreimann !

I briefly considered changing our CSI driver to address this scenario by deleting and recreating the target path. However, I felt somewhat hesitant on this move as it seems a bit invasive / disruptive to me and could mess with the expectations that the CO may have. Overall, I am more inclined to resolving the matter in a fashion that can be considered most compliant with the spec.

Totally agreed. Honestly, this makes life a little bit easier on us anyways as this has been a source of difficulty when handling errors. It's easy enough to remove, but I'll want to verify that the other common SPs that we test with are also doing the right thing per the spec as the DO provider does.

@tgross tgross added the type/bug label Jul 6, 2020
@RickyGrassmuck
Copy link
Contributor

@tgross I am testing the Openstack Cinder CSI plugin right now and just ran into this exact same issue.

E0707 21:20:45.750781       1 utils.go:83] GRPC error: rpc error: code = Internal desc = Error in making file open /csi/per-alloc/d4983594-027f-458f-f421-4442c3ddf318/gitlab_cinder/rw-block-device-single-node-writer: is a directory
W0707 21:20:47.890594       1 mount_helper_common.go:65] Warning: "/csi/per-alloc/d4983594-027f-458f-f421-4442c3ddf318/gitlab_cinder/rw-block-device-single-node-writer" is not a mountpoint, deleting
W0707 21:20:48.014727       1 mount_helper_common.go:33] Warning: Unmount skipped because path does not exist: /csi/staging/gitlab_cinder/rw-block-device-single-node-writer

I'd say not creating the mount points for block is the correct course of action here. If needed, I can provide replication steps for recreating the issue using the cloud-provider-openstack cinder CSI driver .

@tgross
Copy link
Member

tgross commented Jul 8, 2020

Yup, agreed. It's definitely on our stack of CSI bug fixes before we can mark CSI as GA.

@npajkovsky
Copy link

Hey guys,

is there any pullreq/patch to test or any workaround, that I can use?

@matttrach
Copy link

CSI appears to be nonfunctional on multiple platforms. I have tried Cinder, Longhorn, Rook, and MooseFS, all getting similar errors to @rigrassm. Experimentation with CSI is not possible in the current state, is there a workaround coming soon?

@tgross
Copy link
Member

tgross commented Jul 20, 2020

It's on our stack of CSI bug fixes before we can mark CSI as GA. I can't give you an exact timeline other than that it's a high priority for the team and for me personally.

@tgross tgross self-assigned this Jul 21, 2020
@tgross
Copy link
Member

tgross commented Jul 21, 2020

Had a chance to take a first pass at this today. The code in question is (volumeManager) publishVolume. Here's what I've got so far:

  • It looks like we're creating the staging path and including the usage (read-only vs read-write), which we shouldn't be but in practice hasn't mattered because of scheduler constraints.
  • For publishing, we create a per-alloc per-usage path, rooted on the host, and ensure that directory exists.
  • We then use a similar per-alloc per-usage path except rooted in the plugin's container, as the target path.

So there's two bugs here:

  • For NodePublishVolume looks to be that we either shouldn't be creating the per-usage portion of the path or we should be extending it before passing it into the plugin. My suspicion for why we haven't hit this previously is that we don't have good coverage of raw block devices, so that code path isn't getting tested. That'd been a somewhat intentional decision (see CSI: Support for device mode volumes #7139) but we'll correct that deficiency.
  • For NodeStageVolume we should drop the usage portion of the path to belt-and-suspenders our scheduler checks.

The bug fixes themselves are pretty trivial so most of the work to do is testing to make sure there's no regressions from that. I'll tackle that tomorrow.

@tgross
Copy link
Member

tgross commented Jul 22, 2020

I've opened #8505 with the patch but need to complete regression testing.

@RickyGrassmuck
Copy link
Contributor

@tgross awesome! I'll get a build with the fix running tomorrow and test out the cinder CSI driver.

@RickyGrassmuck
Copy link
Contributor

RickyGrassmuck commented Jul 23, 2020

@tgross
The fix in #8505 Looks good for Openstack CSI

└──╼ # mount | grep csi
devtmpfs on /tmp/NomadClient324171302/csi/node/cinder-csi/per-alloc/0c72d91f-b700-282b-dd5a-da5a4c07808d/influxdata/rw-block-device-single-node-writer type devtmpfs (rw,nosuid,size=4078792k,nr_inodes=1019698,mode=755)

@tgross
Copy link
Member

tgross commented Jul 23, 2020

Looking good on DO as well:

# mount | grep csi
udev on /tmp/nomad-csi/client/csi/monolith/digitalocean/per-alloc/bb966fa2-5970-2c60-1635-4f81dbb5a2ab/csi-test-volume/rw-block-device-single-node-writer type devtmpfs (rw,nosuid,relatime,size=4071264k,nr_inodes=1017816,mode=755)
tmpfs on /tmp/nomad-csi/alloc/791c7aa6-252e-aac6-08d1-067d5bd9e51f/plugin/secrets type tmpfs (rw,noexec,relatime,size=1024k)
/dev/sda on /mnt/csi_test_volume type ext4 (rw,noatime,discard,data=ordered)
udev on /tmp/nomad-csi/client/csi/monolith/digitalocean/per-alloc/f901e5c7-760c-ffd5-4561-2d66bbbe9125/csi-test-volume/rw-block-device-single-node-writer type devtmpfs (rw,nosuid,relatime,size=4071264k,nr_inodes=1017816,mode=755)
tmpfs on /tmp/nomad-csi/alloc/f901e5c7-760c-ffd5-4561-2d66bbbe9125/redis/secrets type tmpfs (rw,noexec,relatime,size=1024k)

@tgross
Copy link
Member

tgross commented Jul 23, 2020

Ok, I've merged that patch and it'll ship in 0.12.2. (Just missed 0.12.1 that went out today, sorry folks!)

@tgross tgross added this to the 0.12.2 milestone Jul 23, 2020
@npajkovsky
Copy link

@tgross No worries, I'll apply patch on the top of 0.12.1 and wait. Thanks for the fix.

@github-actions
Copy link

github-actions bot commented Nov 4, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants