Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: implement VolumeContext (support moosefs plugin) #7771

Closed
toomyem opened this issue Apr 21, 2020 · 14 comments · Fixed by #8239
Closed

CSI: implement VolumeContext (support moosefs plugin) #7771

toomyem opened this issue Apr 21, 2020 · 14 comments · Fixed by #8239

Comments

@toomyem
Copy link

toomyem commented Apr 21, 2020

I'm trying to set up CSI plugin for moosefs file system. But it is not clear from the documentation how can I provide the mount path (I've exported /nomad directory in MFS). I'm getting the following error:

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = InvalidArgument desc = NodeStageVolume Endpoint must be provided

Thanks for any help in this matter.

$ cat mfs-cgi-plugin.nomad

job "mfs-cgi-plugin" {
  datacenters = ["home"]

  type = "system"

  group "nodes" {
      task "plugin" {
          driver = "docker"

          config {
              image = "quay.io/tuxera/moosefs-csi-plugin:0.0.4"
              args = ["-endpoint=unix:///csi/csi.sock", "-mfs-endpoint=10.0.0.21", "-topology=master:EP,chunk:EP"]
              privileged = true
          }

          csi_plugin {
              id = "mfs-csi"
              type = "node"
              mount_dir = "/csi"
          }

          resources {
              cpu = 500
              memory = 256
          }
      }
  }
}
$ cat volume.hcl
type = "csi"
plugin_id = "mfs-csi"
id = "mfs-vol1"
name = "mfs-vol1"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
    fs_type = "moosefs"
}

Sample job using volume:

$ cat nginx.nomad
job "nginx" {
    datacenters = ["home"]
    type = "service"

    group "nginx" {
        count = 1

        volume "vol1" {
            type = "csi"
            source = "mfs-vol1"
            read_only = false
        }

        task "nginx" {
            driver = "docker"

            config {
                image = "nginx"
                port_map {
                    http = 80
                }
            }

            volume_mount {
                volume = "vol1"
                destination = "/test"
            }

            resources {
                network {
                    port "http" {}
                }
            }
        }
    }
}
@tgross
Copy link
Member

tgross commented Apr 21, 2020

Hi @toomyem!

Can you provide the alloc logs from the plugin allocation? (nomad alloc logs :alloc_id, possibly with the -stderr flag). It should contain some clues as to what's happening here.

But looks like we might be hitting this line: https://github.com/moosefs/moosefs-csi/blob/77152be87c277fd3e50fcfe492ae8264f862a797/driver/node.go#L44-L46... and I don't see volume_context or VolumeContext anywhere in the Nomad code base. It may be an optional field that we missed. I'll dig into that and report back here.

One other thing I'll note is that the beta of CSI that we shipped with 0.11.0 doesn't include support for topologies. See #7669 for tracking that work, which we're planning before we call CSI "GA". It looks like it might be possible to use the MooseFS CSI plugin without it, according to https://github.com/moosefs/moosefs-csi#storage-deployment-topology-optional ?

@toomyem
Copy link
Author

toomyem commented Apr 22, 2020

It may be related to the issue mentioned by @angrycub in #7764 (comment). I'll try to check if it also fixes my problem.

EDIT: no, it is probably not related.

@toomyem
Copy link
Author

toomyem commented Apr 22, 2020

Here is the log from plugin allocation:

time="2020-04-21T20:06:49Z" level=info msg="removing socket" node_id="unix:///csi/csi.sock" socket=/csi/csi.sock
time="2020-04-21T20:06:49Z" level=info msg="server started" addr=/csi/csi.sock node_id="unix:///csi/csi.sock"
time="2020-04-21T20:06:50Z" level=info msg="probe called" method=prove node_id="unix:///csi/csi.sock"
time="2020-04-21T20:06:50Z" level=info msg="get plugin info called" method=get_plugin_info node_id="unix:///csi/csi.sock" response="name:\"com.tuxera.csi.moosefs\" vendor_version:\"dev\" "
time="2020-04-21T20:06:50Z" level=info msg="get plugin capabitilies called" method=get_plugin_capabilities node_id="unix:///csi/csi.sock" response="capabilities:<service:<type:CONTROLLER_SERVICE > > capabilities:<service:<type:VOLUME_ACCESSIBILITY_CONSTRAINTS > > "
time="2020-04-21T20:06:50Z" level=info msg="node get info called" method=node_get_info node_id="unix:///csi/csi.sock"
time="2020-04-21T20:06:50Z" level=info msg="probe called" method=prove node_id="unix:///csi/csi.sock"
time="2020-04-21T20:06:50Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id="unix:///csi/csi.sock"
time="2020-04-21T20:06:50Z" level=info msg="probe called" method=prove node_id="unix:///csi/csi.sock"
(...)
time="2020-04-22T07:45:46Z" level=error msg="method failed" error="rpc error: code = InvalidArgument desc = NodeStageVolume Endpoint must be provided" method=/csi.v1.Node/NodeStageVolume node_id="unix:///csi/csi.sock"
time="2020-04-22T07:45:47Z" level=info msg="node unpublish volume called" method=node_unpublish_volume node_id="unix:///csi/csi.sock" target_path=/csi/per-alloc/b9f938a7-efb7-6d4d-3bb4-0b6d5abb7852/mfs-vol1/rw-file-system-single-node-writer volume_id=mfs-vol1
time="2020-04-22T07:45:47Z" level=info msg="target path is already unmounted" method=node_unpublish_volume node_id="unix:///csi/csi.sock" target_path=/csi/per-alloc/b9f938a7-efb7-6d4d-3bb4-0b6d5abb7852/mfs-vol1/rw-file-system-single-node-writer volume_id=mfs-vol1
time="2020-04-22T07:45:47Z" level=info msg="unmounting volume is finished" method=node_unpublish_volume node_id="unix:///csi/csi.sock" target_path=/csi/per-alloc/b9f938a7-efb7-6d4d-3bb4-0b6d5abb7852/mfs-vol1/rw-file-system-single-node-writer volume_id=mfs-vol1
time="2020-04-22T07:45:47Z" level=info msg="node unstage volume called" method=node_unstage_volume node_id="unix:///csi/csi.sock" staging_target_path=/csi/staging/mfs-vol1/rw-file-system-single-node-writer volume_id=mfs-vol1
time="2020-04-22T07:45:47Z" level=info msg="staging target path is already unmounted" method=node_unstage_volume node_id="unix:///csi/csi.sock" staging_target_path=/csi/staging/mfs-vol1/rw-file-system-single-node-writer volume_id=mfs-vol1
time="2020-04-22T07:45:47Z" level=info msg="unmounting stage volume is finished" method=node_unstage_volume node_id="unix:///csi/csi.sock" staging_target_path=/csi/staging/mfs-vol1/rw-file-system-single-node-writer volume_id=mfs-vol1
time="2020-04-22T07:45:47Z" level=info msg="probe called" method=prove node_id="unix:///csi/csi.sock"
(...)

@tgross
Copy link
Member

tgross commented Apr 22, 2020

Ok, thanks @toomyem. I'm going to change the title of this issue to reflect the underlying problem and make sure it gets on the team's schedule for wrapping up the remaining CSI features.

@tgross tgross changed the title CSI plugin for moosefs CSI: implement Endpoint field on NodeStageVolume (support moosefs plugin) Apr 22, 2020
@toomyem
Copy link
Author

toomyem commented Apr 22, 2020

Thank you. I'll be monitoring this issue, as I'm interested in making this work.

@tgross tgross changed the title CSI: implement Endpoint field on NodeStageVolume (support moosefs plugin) CSI: implement VolumeContext (support moosefs plugin) May 11, 2020
@tgross tgross self-assigned this May 13, 2020
@tgross
Copy link
Member

tgross commented May 15, 2020

Closed by #7957. This just missed the deadline for 0.11.2, so we'll ship it in 0.11.3.

@tgross tgross closed this as completed May 15, 2020
@tgross tgross added this to the 0.11.3 milestone May 15, 2020
@toomyem
Copy link
Author

toomyem commented Jun 5, 2020

Does anyone successfully run nomad with csi plugin for mfs (moosefs)? Any info about such setup (plugin and volume configuration) would be very appreciated. I cannot pass over the following error:
failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = InvalidArgument desc = NodeStageVolume Endpoint must be provided.

My volume definition file looks like:

type = "csi"
plugin_id = "mfs-csi"
id = "mfs-vol1"
name = "mfs-vol1"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
    fs_type = "moosefs"
}
context {
    endpoint = "http://10.0.0.21:9425"
}

But the plugin still cannot see the endpoint parameter.

@tgross
Copy link
Member

tgross commented Jun 8, 2020

@toomyem you're using the just-released 0.11.3?

@toomyem
Copy link
Author

toomyem commented Jun 8, 2020

Yes, of course. That's why I'm getting back to this issue.

@tgross
Copy link
Member

tgross commented Jun 22, 2020

Hi @toomyem sorry about the delay. I dug into this a bit more and realized that we added the field to the controller RPCs but missed it on the node RPCs. I'm fixing that in #8239, which I'm targeting for the 0.12.0 release. I didn't quite get that done in time for the 0.12.0-beta1 that came out this morning, but assuming that PR gets merged it'll go out in the following beta or the GA.

@tgross
Copy link
Member

tgross commented Jun 22, 2020

@toomyem that fix will ship in 0.12.0-beta2, which should be shipping later this week if you don't want to try it out on a build from master.

@toomyem
Copy link
Author

toomyem commented Jun 29, 2020

Hi.

I just would like to confirm that it finally worked ;) Below is the configuration I used, in case someone is looking for reference.
Thank you for your support and help.

sample volume definition:

type = "csi"
plugin_id = "mfs-csi"
id = "mfs-vol"
name = "mfs-vol"
access_mode = "single-node-writer"
attachment_mode = "file-system"
context {
    endpoint = "mfsmaster:/shared/test"
}

sample job which uses this volume:

job "nginx" {
    datacenters = ["home"]
    type = "service"

    group "nginx" {
        count = 1

        volume "mfs-vol" {
            type = "csi"
            source = "mfs-vol"
        }

        task "nginx" {
            driver = "docker"

            config {
                image = "nginx"
                port_map {
                    http = 80
                }
            }

            volume_mount {
                volume = "mfs-vol"
                destination = "/test"
            }

            resources {
                network {
                    port "http" {}
                }
            }
        }
    }
}

sample job for mfs csi plugin:

job "mfs-csi-plugin" {
  datacenters = ["home"]
  type = "system"

  group "mfs-csi-plugin" {
      task "mfs-csi-plugin" {
          driver = "docker"

          config {
              image = "quay.io/tuxera/moosefs-csi-plugin:0.0.4"
              args = ["-endpoint=unix:///csi/csi.sock", "-mfs-endpoint=mfsmaster", "-topology=master:EP,chunk:EP"]
              privileged = true
          }

          csi_plugin {
              id = "mfs-csi"
              type = "node"
              mount_dir = "/csi"
          }

          resources {
              cpu = 500
              memory = 256
          }
      }
  }
}

@tgross
Copy link
Member

tgross commented Jun 29, 2020

Glad to hear it and thanks again for reporting the issue and helping me figure out what needs to be done!

@github-actions
Copy link

github-actions bot commented Nov 5, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants