Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI block-device mount_options not working as expected when using volume_mount #9021

Closed
acornies opened this issue Oct 4, 2020 · 39 comments · Fixed by #9049
Closed

CSI block-device mount_options not working as expected when using volume_mount #9021

acornies opened this issue Oct 4, 2020 · 39 comments · Fixed by #9049
Assignees
Milestone

Comments

@acornies
Copy link

acornies commented Oct 4, 2020

Nomad version

Nomad v0.12.5+ent (99ba9e9)

Operating system and Environment details

  • Debian Buster 10.4 amd64
  • Docker version 19.03.13, build 4484c46d9d

Issue

We are using Nomad Ent with OpenStack and Cinder CSI. We came across this guide: https://github.com/hashicorp/nomad/tree/master/demo/csi/cinder-csi-plugin which seems to almost do the trick, but the volume fails to mount properly as a ext4 filesystem when using the Docker driver.

When inspecting the volume, the Mount Options are always <none> even though we've registered the volume with options like so:

id = "airflow_db"
name = "airflow_db"
type = "csi"
external_id = "<uuid>"
plugin_id = "cinder-csi"
access_mode = "single-node-writer"
attachment_mode = "block-device"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}
nomad volume status airflow_db
ID                   = airflow_db
Name                 = airflow_db
External ID          = <redacted>
Plugin ID            = cinder-csi
Provider             = cinder.csi.openstack.org
Version              = 1.2.1@latest
Schedulable          = true
Controllers Healthy  = 3
Controllers Expected = 3
Nodes Healthy        = 3
Nodes Expected       = 3
Access Mode          = single-node-writer
Attachment Mode      = block-device
Mount Options        = <none>
Namespace            = default

Allocations
No allocations placed

The cinder csi node and controllers appear to be working as expected and the device does get attached properly from a node perspective (/dev/vdb). Wondering if there's something I'm missing here? I also noticed that on a job level, you can specify the same mount options:

volume "airflow_db" {
      type      = "csi"
      read_only = false
      source    = "airflow_db"

      mount_options {
        fs_type = "ext4"
      }
    }

I've tried both ways with no luck getting these volumes working correctly with Docker. Tagging @rigrassm in case he ran into similar issues.

Reproduction steps

Job file (if appropriate)

job "airflow-db" {
  datacenters = ["${datacenters}"]

  update {
    max_parallel = 1
  }

  group "airflow" {
    count = 1

    volume "airflow_db" {
      type      = "csi"
      read_only = false
      source    = "airflow_db"

      mount_options {
        fs_type = "ext4"
      }
    }

    task "airflow-postgres" {
      driver = "docker"

      config {
        image = "postgres:9.6"

        port_map = {
          tcp = 5432
        }
      }

      volume_mount {
        volume      = "airflow_db"
        destination = "/data"
        read_only   = false
      }

      // example only
      env {
        POSTGRES_DB       = "airflow"
        POSTGRES_USER     = "postgres"
        POSTGRES_PASSWORD = "password"

        PGDATA = "/data"
      }

      service {
        name = "airflow-db"
        tags = ["airflow"]
        port = "tcp"

        check {
          type     = "tcp"
          port     = "tcp"
          interval = "10s"
          timeout  = "5s"
        }
      }

      resources {
        memory = 512

        network {
          port "tcp" {
            static = 5432
          }
        }
      }
    }
  }
}

Thanks for your time.

@RickyGrassmuck
Copy link
Contributor

Thank you for opening this, we just realized this was happening in our deployment a couple weeks ago as well and I haven't had a chance to dig into it more to open the issue.

I'm on mobile and away from the work computer but I can verify that we are seeing the behavior below:

  • The volume is being registered successfully with Nomad using the same registration OP is using which includes the FS type parameter set to ect4.
  • When submitting a job that claims the volume, the cinder CSI driver is getting the claim request and it's directing it to the correct node which is attaching the volume to the host as expected.
  • It appears the cinder driver is not getting the mount options specified in the registration for the fs_type which leads to cinder not getting it properly mounted to the staging directory.

At this point, I would assume an error for this mount failure would be passed back up but that doesn't appear to be happening so Nomad is assuming the volume had been mounted successfully and proceeds with starting the allocation and reporting everything being deployed successfully.

I was able to dig in a little bit and remember coming across the logic for the publish volume command and it looked like it may be applying the mount options only for "file-system" type registrations instead of for "block-device"(I recall the comments indicating it was intended to apply to block-devices).

I feel like the cinder CSI driver should be reporting the publish operation as a failure but isn't for some reason which is likely a big in the driver itself.

I'm in the middle of wrapping up the quarter and getting prepared for the next so things are a bit busy ATM but I can make time to assist with testing/debugging.

@tgross
Copy link
Member

tgross commented Oct 5, 2020

Hi @acornies! Thanks for opening this issue. And thanks for jumping into the discussion @rigrassm!

When inspecting the volume, the Mount Options are always even though we've registered the volume with options like so:

Is that before or after the volume is claimed by an allocation (just trying to figure out if we're overwriting that value incorrectly somehow).

It appears the cinder driver is not getting the mount options specified in the registration for the fs_type which leads to cinder not getting it properly mounted to the staging directory.

It might help here if you could provide the node plugin's logs from when the consuming task is being started. That might help narrow down whether the issue is in Nomad or the cinder driver or both.

@tgross tgross added the type/bug label Oct 5, 2020
@tgross tgross self-assigned this Oct 5, 2020
@acornies
Copy link
Author

acornies commented Oct 5, 2020

Here's a log from one of "node" tasks:

I1003 20:24:05.419460       1 driver.go:69] Driver: cinder.csi.openstack.org
I1003 20:24:05.419503       1 driver.go:70] Driver version: 1.2.1@latest
I1003 20:24:05.419506       1 driver.go:71] CSI Spec version: 1.2.0
I1003 20:24:05.419511       1 driver.go:100] Enabling controller service capability: LIST_VOLUMES
I1003 20:24:05.419514       1 driver.go:100] Enabling controller service capability: CREATE_DELETE_VOLUME
I1003 20:24:05.419516       1 driver.go:100] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I1003 20:24:05.419519       1 driver.go:100] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I1003 20:24:05.419521       1 driver.go:100] Enabling controller service capability: LIST_SNAPSHOTS
I1003 20:24:05.419523       1 driver.go:100] Enabling controller service capability: EXPAND_VOLUME
I1003 20:24:05.419526       1 driver.go:100] Enabling controller service capability: CLONE_VOLUME
I1003 20:24:05.419528       1 driver.go:100] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I1003 20:24:05.419531       1 driver.go:112] Enabling volume access mode: SINGLE_NODE_WRITER
I1003 20:24:05.419534       1 driver.go:122] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I1003 20:24:05.419536       1 driver.go:122] Enabling node service capability: EXPAND_VOLUME
I1003 20:24:05.419538       1 driver.go:122] Enabling node service capability: GET_VOLUME_STATS
I1003 20:24:05.419836       1 openstack.go:87] Block storage opts: {0 false}
I1003 20:24:06.051237       1 server.go:108] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
W1003 20:38:44.758282       1 mount_helper_common.go:65] Warning: "/csi/staging/airflow_db/rw-block-device-single-node-writer" is not a mountpoint, deleting
W1003 20:41:23.834391       1 mount_helper_common.go:65] Warning: "/csi/staging/airflow_db/rw-block-device-single-node-writer" is not a mountpoint, deleting

Log from a controller:


I1003 20:24:05.890499       1 driver.go:69] Driver: cinder.csi.openstack.org
I1003 20:24:05.890555       1 driver.go:70] Driver version: 1.2.1@latest
I1003 20:24:05.890561       1 driver.go:71] CSI Spec version: 1.2.0
I1003 20:24:05.890568       1 driver.go:100] Enabling controller service capability: LIST_VOLUMES
I1003 20:24:05.890574       1 driver.go:100] Enabling controller service capability: CREATE_DELETE_VOLUME
I1003 20:24:05.890578       1 driver.go:100] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I1003 20:24:05.890582       1 driver.go:100] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I1003 20:24:05.890586       1 driver.go:100] Enabling controller service capability: LIST_SNAPSHOTS
I1003 20:24:05.890593       1 driver.go:100] Enabling controller service capability: EXPAND_VOLUME
I1003 20:24:05.890596       1 driver.go:100] Enabling controller service capability: CLONE_VOLUME
I1003 20:24:05.890600       1 driver.go:100] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I1003 20:24:05.890605       1 driver.go:112] Enabling volume access mode: SINGLE_NODE_WRITER
I1003 20:24:05.890789       1 driver.go:122] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I1003 20:24:05.890804       1 driver.go:122] Enabling node service capability: EXPAND_VOLUME
I1003 20:24:05.890808       1 driver.go:122] Enabling node service capability: GET_VOLUME_STATS
I1003 20:24:05.891248       1 openstack.go:87] Block storage opts: {0 false}
I1003 20:24:06.472957       1 server.go:108] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}

Is that before or after the volume is claimed by an allocation

Before. Nomad doesn't appear to persist the mount options before an alloc utilizes it. Maybe I can get some more verbose logging from the driver...

@tgross
Copy link
Member

tgross commented Oct 5, 2020

Before. Nomad doesn't appear to persist the mount options before an alloc utilizes it.

Hrm, ok... lemme see if I can reproduce that outside of a cinder plugin.

Maybe I can get some more verbose logging from the driver...

Yeah typically for this sort of thing we'll want debug-level logs, as the plugin developers don't seem to log individual volume operations at any higher level than that. By the values we've got here I'd say the plugin is defaulting to info-level.

@RickyGrassmuck
Copy link
Contributor

Yeah typically for this sort of thing we'll want debug-level logs, as the plugin developers don't seem to log individual volume operations at any higher level than that. By the values we've got here I'd say the plugin is defaulting to info-level.

Unfortunately (and this is one of the reasons I had to hold off on trying to deep dive any further at the time) I couldn't find any way to enable debug level logging after looking through the source either.

I just took a look at the k8s manifests they use for deploying the driver again and noticed a couple things in the definition for the node-plugin definition that differ from what is in the nomad example job.

https://github.com/kubernetes/cloud-provider-openstack/blob/master/manifests/cinder-csi-plugin/cinder-csi-nodeplugin.yaml

        - name: cinder-csi-plugin
          securityContext:
            privileged: true
            capabilities:
              add: ["SYS_ADMIN"]
            allowPrivilegeEscalation: true
          image: docker.io/k8scloudprovider/cinder-csi-plugin:latest
          args:
            - /bin/cinder-csi-plugin
            - "--nodeid=$(NODE_ID)"
            - "--endpoint=$(CSI_ENDPOINT)"
            - "--cloud-config=$(CLOUD_CONFIG)"
          env:
            - name: NODE_ID
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CSI_ENDPOINT
              value: unix://csi/csi.sock
            - name: CLOUD_CONFIG
              value: /etc/config/cloud.conf
          imagePullPolicy: "IfNotPresent"
          volumeMounts:
            - name: socket-dir
              mountPath: /csi
            - name: kubelet-dir
              mountPath: /var/lib/kubelet
              mountPropagation: "Bidirectional"
            - name: pods-probe-dir
              mountPath: /dev
              mountPropagation: "HostToContainer"
            - name: secret-cinderplugin
              mountPath: /etc/config
              readOnly: true
      volumes:
        - name: socket-dir
          hostPath:
            path: /var/lib/kubelet/plugins/cinder.csi.openstack.org
            type: DirectoryOrCreate
        - name: registration-dir
          hostPath:
            path: /var/lib/kubelet/plugins_registry/
            type: Directory
        - name: kubelet-dir
          hostPath:
            path: /var/lib/kubelet
            type: Directory
        - name: pods-probe-dir
          hostPath:
            path: /dev
            type: Directory
        - name: secret-cinderplugin
          secret:
            secretName: cloud-config

The two things that stand out to me are:

  1. The args for the node plugin do not contain a cluster id in them which we have set in both the controller and node tasks in the example deployment. I'm curious if having that option set in the node tasks definition is causing it to not behave properly.
  2. They appear to be mounting the directory /var/lib/kubelets inside the node plugin container and it's not clear exactly what that is used for but I'm wondering if that is a path used by the plugin for staging the volumes. If so would we need to pass in the Nomad equivalent for staging purposes?

@RickyGrassmuck
Copy link
Contributor

Re:

It appears the cinder driver is not getting the mount options specified in the registration for the fs_type which leads to cinder not getting it properly mounted to the staging directory.

I believe this is what I remember seeing when I initially looked into the mount options not being set.

func (c *VolumeCapability) ToCSIRepresentation() *csipbv1.VolumeCapability {

Specifically:

	if c.AccessType == VolumeAccessTypeMount {
		opts := &csipbv1.VolumeCapability_MountVolume{}
		if c.MountVolume != nil {
			opts.FsType = c.MountVolume.FSType
			opts.MountFlags = c.MountVolume.MountFlags
		}
		vc.AccessType = &csipbv1.VolumeCapability_Mount{Mount: opts}
	} else {
		vc.AccessType = &csipbv1.VolumeCapability_Block{Block: &csipbv1.VolumeCapability_BlockVolume{}}
	}

	return vc

The mount options defined in this struct only appear to be attached to Volumes of type "VolumeAccessTypeMount" (ie. file-system) and don't get attached associated with block-device types.

type CSIMountOptions struct {
	// FSType is an optional field that allows an operator to specify the type
	// of the filesystem.
	FSType string

	// MountFlags contains additional options that may be used when mounting the
	// volume by the plugin. This may contain sensitive data and should not be
	// leaked.
	MountFlags []string
}

I'm not certain that I'm seeing the whole picture here so I figured i'd just jot down what I saw and the assumptions in hopes that someone more familiar with the code base and the CSI spec could determine if it's actually an issue or not lol.

@RickyGrassmuck
Copy link
Contributor

RickyGrassmuck commented Oct 5, 2020

Sorry for the comment spam, just came across this issue which states that you can change the log level for the node plugin using the flag -v=4 or -v=5 for the most verbose.

@acornies
Copy link
Author

acornies commented Oct 5, 2020

Thanks @rigrassm - I was able to relaunch with -v=4 log verbosity and there's a little more:

I1005 23:43:40.559851       1 driver.go:69] Driver: cinder.csi.openstack.org
I1005 23:43:40.559912       1 driver.go:70] Driver version: 1.2.1@latest
I1005 23:43:40.559918       1 driver.go:71] CSI Spec version: 1.2.0
I1005 23:43:40.559927       1 driver.go:100] Enabling controller service capability: LIST_VOLUMES
I1005 23:43:40.559932       1 driver.go:100] Enabling controller service capability: CREATE_DELETE_VOLUME
I1005 23:43:40.559936       1 driver.go:100] Enabling controller service capability: PUBLISH_UNPUBLISH_VOLUME
I1005 23:43:40.559940       1 driver.go:100] Enabling controller service capability: CREATE_DELETE_SNAPSHOT
I1005 23:43:40.559943       1 driver.go:100] Enabling controller service capability: LIST_SNAPSHOTS
I1005 23:43:40.559949       1 driver.go:100] Enabling controller service capability: EXPAND_VOLUME
I1005 23:43:40.559953       1 driver.go:100] Enabling controller service capability: CLONE_VOLUME
I1005 23:43:40.559956       1 driver.go:100] Enabling controller service capability: LIST_VOLUMES_PUBLISHED_NODES
I1005 23:43:40.559960       1 driver.go:112] Enabling volume access mode: SINGLE_NODE_WRITER
I1005 23:43:40.559965       1 driver.go:122] Enabling node service capability: STAGE_UNSTAGE_VOLUME
I1005 23:43:40.559969       1 driver.go:122] Enabling node service capability: EXPAND_VOLUME
I1005 23:43:40.559973       1 driver.go:122] Enabling node service capability: GET_VOLUME_STATS
I1005 23:43:40.559983       1 openstack.go:117] InitOpenStackProvider configFile: /etc/config/cloud.conf
I1005 23:43:40.560483       1 openstack.go:87] Block storage opts: {0 false}
I1005 23:43:40.560534       1 openstack.go:515] Using user-agent cinder-csi-plugin/latest gophercloud/2.0.0
I1005 23:43:41.234503       1 mount_linux.go:163] Detected OS without systemd
I1005 23:43:41.234798       1 server.go:108] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I1005 23:43:46.596594       1 utils.go:80] GRPC call: /csi.v1.Identity/Probe
I1005 23:43:46.626274       1 utils.go:80] GRPC call: /csi.v1.Identity/GetPluginInfo
I1005 23:43:46.627690       1 utils.go:80] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I1005 23:43:46.628518       1 utils.go:80] GRPC call: /csi.v1.Node/NodeGetInfo
I1005 23:43:46.628799       1 metadata.go:159] Attempting to mount configdrive /dev/disk/by-label/config-2 on /tmp/configdrive060057586
I1005 23:43:46.628915       1 mount_linux.go:146] Mounting cmd (mount) with arguments (-t iso9660 -o ro /dev/disk/by-label/config-2 /tmp/configdrive060057586)
I1005 23:43:46.633024       1 utils.go:80] GRPC call: /csi.v1.Identity/Probe
I1005 23:43:46.651684       1 metadata.go:171] Configdrive mounted on /tmp/configdrive060057586
I1005 23:43:46.651918       1 mount_linux.go:238] Unmounting /tmp/configdrive060057586
I1005 23:43:46.677546       1 utils.go:80] GRPC call: /csi.v1.Identity/Probe
I1005 23:43:46.686412       1 utils.go:80] GRPC call: /csi.v1.Node/NodeGetCapabilities
...
I1005 23:48:16.811004       1 utils.go:80] GRPC call: /csi.v1.Node/NodeGetCapabilities
I1005 23:48:34.072499       1 utils.go:80] GRPC call: /csi.v1.Node/NodeStageVolume
I1005 23:48:34.072534       1 nodeserver.go:341] NodeStageVolume: called with args {VolumeId:e9369f58-5cfa-4620-bbb8-99f748f48c47 PublishContext:map[DevicePath:/dev/vdb] StagingTargetPath:/csi/staging/airflow_db/rw-block-device-single-node-writer VolumeCapability:block:<> access_mode:<mode:SINGLE_NODE_WRITER >  Secrets:map[] VolumeContext:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1005 23:48:34.777166       1 mount.go:171] Found disk attached as "virtio-e9369f58-5cfa-4620-b"; full devicepath: /dev/disk/by-id/virtio-e9369f58-5cfa-4620-b
I1005 23:48:34.782264       1 utils.go:80] GRPC call: /csi.v1.Node/NodePublishVolume
I1005 23:48:34.782294       1 nodeserver.go:50] NodePublishVolume: called with args {VolumeId:e9369f58-5cfa-4620-bbb8-99f748f48c47 PublishContext:map[DevicePath:/dev/vdb] StagingTargetPath:/csi/staging/airflow_db/rw-block-device-single-node-writer TargetPath:/csi/per-alloc/f275a187-976b-a42f-8299-bdd9eeff215f/airflow_db/rw-block-device-single-node-writer VolumeCapability:block:<> access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1005 23:48:34.869597       1 nodeserver.go:219] NodePublishVolumeBlock: called with args {VolumeId:e9369f58-5cfa-4620-bbb8-99f748f48c47 PublishContext:map[DevicePath:/dev/vdb] StagingTargetPath:/csi/staging/airflow_db/rw-block-device-single-node-writer TargetPath:/csi/per-alloc/f275a187-976b-a42f-8299-bdd9eeff215f/airflow_db/rw-block-device-single-node-writer VolumeCapability:block:<> access_mode:<mode:SINGLE_NODE_WRITER >  Readonly:false Secrets:map[] VolumeContext:map[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I1005 23:48:34.869769       1 mount.go:171] Found disk attached as "virtio-e9369f58-5cfa-4620-b"; full devicepath: /dev/disk/by-id/virtio-e9369f58-5cfa-4620-b
I1005 23:48:34.869868       1 mount_linux.go:146] Mounting cmd (mount) with arguments ( -o bind /dev/disk/by-id/virtio-e9369f58-5cfa-4620-b /csi/per-alloc/f275a187-976b-a42f-8299-bdd9eeff215f/airflow_db/rw-block-device-single-node-writer)
I1005 23:48:34.881810       1 mount_linux.go:146] Mounting cmd (mount) with arguments ( -o bind,remount,rw /dev/disk/by-id/virtio-e9369f58-5cfa-4620-b /csi/per-alloc/f275a187-976b-a42f-8299-bdd9eeff215f/airflow_db/rw-block-device-single-node-writer)
I1005 23:48:46.812144       1 utils.go:80] GRPC call: /csi.v1.Identity/Probe
I1005 23:48:46.812445       1 utils.go:80] GRPC call: /csi.v1.Identity/Probe
I1005 23:48:46.831133       1 utils.go:80] GRPC call: /csi.v1.Node/NodeGetCapabilities

Something that stands out is VolumeCapability:block:<> which seems like it's referencing the code path mentioned above.

@RickyGrassmuck
Copy link
Contributor

That looks promising, it seems odd that there aren't any errors being reported when mounting the device but it may be expected behaviour from the drivers perspective if no mount options are provided.

I think @tgross will be able to get a good idea of what's going on here with this information though.

@tgross
Copy link
Member

tgross commented Oct 6, 2020

Perfect, thanks folks.

@rigrassm wrote:

They appear to be mounting the directory /var/lib/kubelets inside the node plugin container and it's not clear exactly what that is used for but I'm wondering if that is a path used by the plugin for staging the volumes. If so would we need to pass in the Nomad equivalent for staging purposes?

We bind in the staging directories already in the plugin_supervisor_hook. It may be that the plugin is looking for specific directories. If it wasn't finding them I'd expect that to show up in the node plugin logs, but worth trying.

@rigrassm wrote:

The mount options defined in this struct only appear to be attached to Volumes of type "VolumeAccessTypeMount" (ie. file-system) and don't get attached associated with block-device types.

and @acornies wrote:

Something that stands out is VolumeCapability:block:<> which seems like it's referencing the code path mentioned above.

Good eye on noticing that, but if you search thru the spec for the definition of VolumeCapability, you'll see the following:

// Specify a capability of a volume.
message VolumeCapability {
  // Indicate that the volume will be accessed via the block device API.
  message BlockVolume {
    // Intentionally empty, for now.
  }

  // Indicate that the volume will be accessed via the filesystem API.
  message MountVolume {
    // The filesystem type. This field is OPTIONAL.
    // An empty string is equal to an unspecified field value.
    string fs_type = 1;

    // The mount options that can be used for the volume. This field is
    // OPTIONAL. `mount_flags` MAY contain sensitive information.
    // Therefore, the CO and the Plugin MUST NOT leak this information
    // to untrusted entities. The total size of this repeated field
    // SHALL NOT exceed 4 KiB.
    repeated string mount_flags = 2;
  }
  ...

I'm wondering if this is a validation issue... are we sure that the cinder plugin is supposed to support these options?

@acornies
Copy link
Author

acornies commented Oct 6, 2020

My assumption given Nomad's own docs was that this is possible for block-device volumes: https://www.nomadproject.io/docs/commands/volume/register#mount_options
Since there were examples already merged into this repo, I thought I'd give it a try 😄 .

@tgross
Copy link
Member

tgross commented Oct 6, 2020

Well that's embarrassing! 😀 I'll have to double-check the origin PR for those docs and see if I can track down why. I probably won't get a chance to circle back to this until tomorrow though.

@tgross
Copy link
Member

tgross commented Oct 7, 2020

Let's see who wrote those docs... oh dear, it was me 🤦 #7439

Ok, well if I follow the path of the nomad volume register command, I get:

With a bit more digging I'm remembering that having mount options associated with a specific volume isn't part of the CSI spec, but something Nomad is doing for operator convenience -- it's setting a default value for that volume. The volume's mount options get merged with the volume_mount's mount options only once the request lands on the client.

So I think there are three fixes here:

@acornies
Copy link
Author

acornies commented Oct 7, 2020

From a higher level, what are your suggestions on how to use Nomad with cinder-csi? Is it even possible? Not being a CSI expert, I'm wondering how the k8s ecosystem utilizes it and how that could be applied here (if at all).

@tgross
Copy link
Member

tgross commented Oct 7, 2020

From a higher level, what are your suggestions on how to use Nomad with cinder-csi? Is it even possible?

It should be, with the caveat that we don't support the CreateVolume RPCs yet (see #8212) so you need to do that out of band. I suspect that the reason the mount options aren't being accepted at all (even when set at the job level) was that the block-device setting should have been set to filesystem. After that, I think you're ok with the job you've got.

Not being a CSI expert, I'm wondering how the k8s ecosystem utilizes it and how that could be applied here (if at all).

It looks like they have an example of a consumer job for us to work from: a MySQL job that they've split for some reason between the split between the cloud-provider-openstack repo and the kubernetes examples repo. That all seems to line up pretty well with @rigrassm's example, including the required cloud config settings.

@acornies
Copy link
Author

acornies commented Oct 7, 2020

BTW, thanks very much for spending time on this - really appreciated.

I guess I'm wondering where the disconnect is (could very well be my understanding) because the cinder csi examples use block-device for attachment_mode and when I change my attachment_mode to file-system I get:

nomad volume register volume.hcl 
Error registering volume: Unexpected response code: 500 (rpc error: rpc error: rpc error: rpc error: validate volume: volume capability validation failed: 1 error occurred:
        * requested AccessType Mount but got AccessType Block

)

I use the ID from this terraform openstack resource:

resource "openstack_blockstorage_volume_v3" "airflow_test" {
  name                 = "airflow_data"
  size                 = 10
  enable_online_resize = true
  multiattach          = false
}

@rigrassm Did you have this working at some point with a Docker-based job?

@RickyGrassmuck
Copy link
Contributor

RickyGrassmuck commented Oct 7, 2020

@acornies Unfortunately no, I believed it was working due to everything in the process going through as expected (including nomad reporting the volume being attached to the alloc) after starting the job.

It wasn't until we had a job get rescheduled which ended up losing it's persisted data that I realized that the volume want actually being mounted.

Re: using file-system over block-device, that won't work as the cinder driver is designed as a block-device type.

This absolutely should be possible though. I'm going to be a little tired up the rest of this week but I'll try and poke at things some more if I get some free time.

It may be worth opening an issue over at the cloud-provider-openstack project to see if they can help us make sense of what's going on here (I think this is likely going to be needed at some point anyways since the driver not actually mounting the device should be an error which is not being thrown and I consider a bug in the driver). If you would like to open an issue over there and tag me I'll be happy to help over there. If not, I'll try and open one up when I'm able to get to seeing things up again in the test environment.

@RickyGrassmuck
Copy link
Contributor

RickyGrassmuck commented Oct 7, 2020

So I this after searching around a bit about mount options in the volume registration, this page indicates that there are specific cases where mount_options may apply to block devices.

https://kubernetes.io/docs/concepts/storage/persistent-volumes/#mount-options

@tgross
Copy link
Member

tgross commented Oct 7, 2020

So I this after searching around a bit about mount options in the volume registration, this page indicates that there are specific cases where mount_options may apply to block devices.

This is where the nomenclature they've chosen is super unfortunate, because it's overlapping an existing term of art. I'm fairly certain the references there to "block devices" mean in the sense of the operating system, and not in the two APIs available in CSI. As in a block device vs a character device (totally unsupported) vs an object store (external HTTP API).

@acornies
Copy link
Author

acornies commented Oct 7, 2020

This absolutely should be possible though.

Yes, others in our organization are using this exact driver with k8s successfully. Putting the spec aside for a moment, would passing in the mount options for block devices in this context work in theory? Is this a case where Nomad has strictly adhered to the spec, not taking into account a big use case which is OpenStack Cinder?

@acornies
Copy link
Author

acornies commented Oct 7, 2020

Breaking it down a little further: I thought I wouldn't have to do something like: mkfs.ext4 ... on the node in addition to setting up Nomad with Cinder CSI.

@tgross
Copy link
Member

tgross commented Oct 7, 2020

Putting the spec aside for a moment, would passing in the mount options for block devices in this context work in theory? Is this a case where Nomad has strictly adhered to the spec, not taking into account a big use case which is OpenStack Cinder?

That's certainly a possibility, so let's take a look at the Cinder plugin code. Looks like the filesystem for mounting (ext4 by default) is being determined and formatted in the node plugin at the end of NodeStageVolume by checking the volume capabilities. Ah ha! So they're doing exactly what you've suggested!

And if we look at where we translate from Nomad's internal representation to the CSI protobuf in VolumeCapability.ToCSIRepresentation(), we can see we drop the mount options field because we don't expect to see it.

Breaking it down a little further: I thought I wouldn't have to do something like: mkfs.ext4 ... on the node in addition to setting up Nomad with Cinder CSI.

Agreed. That being said, given that code I'm not sure why you need to pass the ext4 format field. Doesn't it automatically format to ext4 for you if no mount option is passed in and the volume isn't formatted?


If leaving it off doesn't work, we can fix it... let's enumerate our options and their tradeoffs:

(1) Allow mount options to be passed in for block volume plugins.

  • Good: the cinder plugin works!
  • Bad: if a user of a different plugin passes in mount_options when they're not supported, it may break and they'll wonder why Nomad let them do that.

(2) Validate the mount options strictly against the spec:

  • Good: we're complying with the spec so have a leg to stand on with upstream bug reports
  • Bad: the cinder plugin can only use the default mount options.
  • Bad: who knows if other plugins will be broken in the same way

I think I'm leaning towards allowing the mount options to be passed in for block volume then, and being a little more loose in spec compliance. I want to grab @schmichael for his thoughts on that though.

@tgross
Copy link
Member

tgross commented Oct 7, 2020

@acornies I think I missed this earlier, but the plugin does treat block device capability and mount capability separately. If it finds the block device capability it just exits that stage early nodeserver.go#L373. So it really looks like they're supposed to be accepting both block-device and file-system.

Maybe the error you ran into earlier had to with how the volume was created in OpenStack? Can you provide a redacted command line for openstack volume create (if that's how you're making these)?

@tgross
Copy link
Member

tgross commented Oct 7, 2020

Also, if we look a little more closely at the protobuf: https://github.com/container-storage-interface/spec/blob/v1.3.0/csi.proto#L387 we can't pass both the BlockVolume and MountVolume messages, so option (1) will definitely break a whole lot of plugins.

@acornies
Copy link
Author

acornies commented Oct 7, 2020

Maybe the error you ran into earlier had to with how the volume was created in OpenStack?

I don't see any other options I can use for volume creation (v3) in our OpenStack APIs. When using these "block-device" volumes on a VM, I have to use mkfs.ext4 (or config management equivalent) to format/create the filesystem before mounting.

This comment on the bottom of the OpenShift docs looks relevant: https://docs.openshift.com/enterprise/3.1/install_config/persistent_storage/persistent_storage_cinder.html#volume-format-cinder

Is this what the csi driver is supposed to handle? Are there any other code paths, maybe with volume_mount that we should consider?

@tgross
Copy link
Member

tgross commented Oct 7, 2020

Is this what the csi driver is supposed to handle?

Yes, filesystem formatting is entirely controlled by the NodeStageVolume RPC. So if you want the volume to be formatted, you need to use the file-system option.

That made me realize something about the error you showed earlier:

nomad volume register volume.hcl 
Error registering volume: Unexpected response code: 500 (rpc error: rpc error: rpc error: rpc error: validate volume: volume capability validation failed: 1 error occurred:
        * requested AccessType Mount but got AccessType Block

This is coming from plugins/csi/client.go#L419 when we parse the response from the ValidateVolumeCapabilities RPC. We're expecting the response to include the mount block if you pass it. But the plugin is just throwing that information away, neither validating it or returning it back to us.

The spec tells us that the CO (Nomad, in this case):

// For successful validation responses, the CO SHALL compare the
// fields of this message to the originally requested capabilities in
// order to guard against an older plugin reporting "valid" for newer
// capability fields that it does not yet understand.

But I think we misread this and are being overly aggressive in validating the response we get back from the plugin. My reading of this is that we should be checking the return for the specifically-validated capabilities, but that we should then only use the capabilities that we've validated. Which in this case would mean not passing the mount options along and letting the default behavior of the plugin take over. If k8s is closely following the spec here, that's probably why the Cinder plugin is working there. But the Cinder plugin should be returning any fields it wants to use, as far as I can tell.

You could test this theory by using the Cinder plugin with k8s and setting the fs_type to something other than ext4; I suspect that it's not being respected.

@tgross
Copy link
Member

tgross commented Oct 7, 2020

That being said, I want to check in with the CSI folks to ensure my interpretation is correct here before we go making changes.

@tgross
Copy link
Member

tgross commented Oct 7, 2020

I've opened container-storage-interface/spec#458 to get some clarification.

@tgross
Copy link
Member

tgross commented Oct 7, 2020

More data points:

The CSI spec for the ValidationVolumeCapabilities requires that plugins only set the Confirmed field if they've validated all capabilities. The Nomad client improperly assumes that the lack of a Confirmed field should be treated as a failure. This breaks the Azure and Linode block storage plugins, which don't set this optional field.

The CSI spec also requires that the orchestrator check the validation responses to guard against older versions of a plugin reporting "valid" for newer fields it doesn't understand.

@tgross
Copy link
Member

tgross commented Oct 7, 2020

I've discovered something unfortunate about why k8s works with the Cinder driver... it's because they don't actually do the ValidateVolumeCapabilities. Controllers are still being managed outside of k8s proper via the external-attacher, which doesn't implement that RPC. That's going to make it difficult to have parity if we're ahead of the Big Gorilla implementation. Let's see what the CSI folks come back with in terms of a recommendation. Maybe given that k8s isn't implementing this at all it'd be better to default to a very relaxed stance.

@tgross
Copy link
Member

tgross commented Oct 8, 2020

I've got an exactly failing test for the case you're getting, @acornies, so the fix to loosen the validation should be pretty trivial at this point:

diff --git a/plugins/csi/client_test.go b/plugins/csi/client_test.go
index 265d234b8..061e3b193 100644
--- a/plugins/csi/client_test.go
+++ b/plugins/csi/client_test.go
@@ -564,6 +564,23 @@ func TestClient_RPC_ControllerValidateVolume(t *testing.T) {
                        // this is a multierror
                        ExpectedErr: fmt.Errorf("volume capability validation failed: 1 error occurred:\n\t* requested mount flags did not match available capabilities\n\n"),
                },
+
+               {
+                       Name: "handles incomplete confirmed response",
+                       Response: &csipbv1.ValidateVolumeCapabilitiesResponse{
+                               Confirmed: &csipbv1.ValidateVolumeCapabilitiesResponse_Confirmed{
+                                       VolumeCapabilities: []*csipbv1.VolumeCapability{
+                                               {
+                                                       AccessMode: &csipbv1.VolumeCapability_AccessMode{
+                                                               Mode: csipbv1.VolumeCapability_AccessMode_MULTI_NODE_MULTI_WRITER,
+                                                       },
+                                               },
+                                       },
+                               },
+                       },
+                       ResponseErr: nil,
+                       ExpectedErr: nil,
+               },
        }

        for _, tc := range cases {
$ go test -v ./plugins/csi -run TestClient_RPC_ControllerValidateVolume -count=1
=== RUN   TestClient_RPC_ControllerValidateVolume
=== RUN   TestClient_RPC_ControllerValidateVolume/handles_underlying_grpc_errors
=== RUN   TestClient_RPC_ControllerValidateVolume/handles_empty_success
=== RUN   TestClient_RPC_ControllerValidateVolume/handles_validate_success
=== RUN   TestClient_RPC_ControllerValidateVolume/handles_validation_failure_block_mismatch
=== RUN   TestClient_RPC_ControllerValidateVolume/handles_validation_failure_mount_flags
=== RUN   TestClient_RPC_ControllerValidateVolume/handles_incomplete_confirmed_response
    client_test.go:614:
                Error Trace:    client_test.go:614
                Error:          Received unexpected error:
                                volume capability validation failed: 1 error occurred:
                                        * requested AccessType Mount but got AccessType Block

                Test:           TestClient_RPC_ControllerValidateVolume/handles_incomplete_confirmed_response
                Messages:       handles incomplete confirmed response
--- FAIL: TestClient_RPC_ControllerValidateVolume (0.00s)

@acornies
Copy link
Author

acornies commented Oct 8, 2020

Thanks for the investigation @tgross! Looks like we're moving forward with relaxing the validation rules which is super exciting! Working CSI with OpenStack Cinder is a game changer for us.

Are you folks moving forward with the change in a specific branch, or are you accepting PRs for this?

@tgross
Copy link
Member

tgross commented Oct 8, 2020

Are you folks moving forward with the change in a specific branch, or are you accepting PRs for this?

I should have a PR up by end-of-day (East Coast US). That'll get shipped in the upcoming 0.13.

@RickyGrassmuck
Copy link
Contributor

Heck Yeah, thank you @tgross!

@tgross
Copy link
Member

tgross commented Oct 8, 2020

I've just merged #9049 and that will ship in 0.13.

@tgross tgross added this to the 0.13 milestone Oct 8, 2020
@travisghansen
Copy link

Really late to the party here but clarification about the file-system vs block-device question. In this case it’s about how the storage is exposed to the container, not what’s backing it per-se. ie: csi supports exposing volumes as raw block devices to containers which could hypothetically then do their own formatting etc...it’s not heavily used but that is the context of the option. As an example I could create a nfs based csi driver which allows block access...in which case the driver would dd a bunch of 0s to a file on the nfs share and then mount the file into the container as if it were a ‘real’ block device like an iscsi drive etc.

@RickyGrassmuck
Copy link
Contributor

@travisghansen Yeah, I think that's where the confusion originated from. In my case, I was interpreting the attachment-mode option as the way to tell Nomad what the actual storage backing system used (ie. Expect this volume to be a block-device) rather instead of instructions to the driver for how to attach the volume to the container.

In hindsight, it seems like a silly misinterpretation but this thread should serve as a good source of information for when others are trying to figure things out.

@travisghansen
Copy link

Yes, the terminology is definitely confusing! Hopefully this helps a little for folks who stumble upon this!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants