Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[csi,hcloud] failed to stage volume: exit status 1 #7775

Closed
mkrueger-sabio opened this issue Apr 22, 2020 · 9 comments
Closed

[csi,hcloud] failed to stage volume: exit status 1 #7775

mkrueger-sabio opened this issue Apr 22, 2020 · 9 comments

Comments

@mkrueger-sabio
Copy link

mkrueger-sabio commented Apr 22, 2020

Plugin

job "hcloud-csi" {
  datacenters = ["dc1"]
  type = "system"

  group "node" {
    task "plugin" {
      driver = "docker"

      config {
        image = "hetznercloud/hcloud-csi-driver:1.2.3"
        privileged = true
      }

      env {
        CSI_ENDPOINT="unix:///csi/csi.sock"
        HCLOUD_TOKEN="<token>"
        HCLOUD_SERVER_ID="<server id>"
      }

      csi_plugin {
        id        = "csi.hetzner.cloud"
        type      = "monolith"
        mount_dir = "/csi"
      }
    }
  }
}
$ ./nomad plugin status csi.hetzner.cloud
ID                   = csi.hetzner.cloud
Provider             = csi.hetzner.cloud
Version              = 1.2.3
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
10f9406a  72cc340f  node        0        run      running  33s ago  16s ago

Volume

type = "csi"
id = "5073567"
name = "documents"
external_id = "5073567"
access_mode = "single-node-writer"
attachment_mode = "file-system"
plugin_id = "csi.hetzner.cloud"
$ ./nomad volume status 5073567
ID                   = 5073567
Name                 = documents
External ID          = 5073567
Plugin ID            = csi.hetzner.cloud
Provider             = csi.hetzner.cloud
Version              = 1.2.3
Schedulable          = true
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 1
Nodes Expected       = 1
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
No allocations placed

httpd job

job "httpd" {
  datacenters = ["dc1"]

  group "httpd" {

    restart {
      attempts = 10
      interval = "5m"
      delay    = "25s"
      mode     = "delay"
    }

    volume "documents" {
      type      = "csi"
      read_only = false
      source    = "5073567"
    }

    task "httpd" {
      driver = "docker"

      volume_mount {
        volume      = "documents"
        destination = "/srv"
        read_only   = false
      }

      config {
        image = "httpd:2.4.41-alpine"
      }
    }
  }
}

hcloud csi plugin logs

Register volume:

level=debug ts=2020-04-22T13:09:51.775031587Z component=grpc-server msg="handling request" req="volume_id:\"5073567\" volume_capabilities:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > "

Run httpd job:

level=debug ts=2020-04-22T13:09:58.224281891Z component=grpc-server msg="handling request" req="volume_id:\"5073567\" node_id:\"4596615\" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > "
level=info ts=2020-04-22T13:09:58.224328945Z component=api-volume-service msg="attaching volume" volume-id=5073567 server-id=4596615
level=debug ts=2020-04-22T13:10:01.487346907Z component=grpc-server msg="finished handling request"
level=debug ts=2020-04-22T13:10:01.491824013Z component=grpc-server msg="handling request" req="volume_id:\"5073567\" staging_target_path:\"/csi/staging/5073567/rw-file-system-single-node-writer\" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > "
level=debug ts=2020-04-22T13:10:01.980538605Z component=linux-mount-service msg="staging volume" volume-name=csi-test staging-target-path=/csi/staging/5073567/rw-file-system-single-node-writer fs-type=ext4
level=error ts=2020-04-22T13:10:02.039020101Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to stage volume: exit status 1"
level=debug ts=2020-04-22T13:10:02.195025014Z component=grpc-server msg="handling request" req="volume_id:\"5073567\" target_path:\"/csi/per-alloc/61897558-c024-8b12-f304-515de1522e01/5073567/rw-file-system-single-node-writer\" "
level=debug ts=2020-04-22T13:10:03.580825654Z component=linux-mount-service msg="unpublishing volume" volume-name=csi-test target-path=/csi/per-alloc/61897558-c024-8b12-f304-515de1522e01/5073567/rw-file-system-single-node-writer
level=error ts=2020-04-22T13:10:03.581883066Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to unpublish volume: Unmount failed: exit status 1\nUnmounting arguments: /csi/per-alloc/61897558-c024-8b12-f304-515de1522e01/5073567/rw-file-system-single-node-writer\nOutput: umount: can't unmount /csi/per-alloc/61897558-c024-8b12-f304-515de1522e01/5073567/rw-file-system-single-node-writer: No such file or directory\n\n"
level=debug ts=2020-04-22T13:10:04.587567296Z component=grpc-server msg="handling request" req="volume_id:\"5073567\" target_path:\"/csi/per-alloc/61897558-c024-8b12-f304-515de1522e01/5073567/rw-file-system-single-node-writer\" "
level=debug ts=2020-04-22T13:10:05.053140388Z component=linux-mount-service msg="unpublishing volume" volume-name=csi-test target-path=/csi/per-alloc/61897558-c024-8b12-f304-515de1522e01/5073567/rw-file-system-single-node-writer
level=error ts=2020-04-22T13:10:05.056994464Z component=grpc-server msg="handler failed" err="rpc error: code = Internal desc = failed to unpublish volume: Unmount failed: exit status 1\nUnmounting arguments: /csi/per-alloc/61897558-c024-8b12-f304-515de1522e01/5073567/rw-file-system-single-node-writer\nOutput: umount: can't unmount /csi/per-alloc/61897558-c024-8b12-f304-515de1522e01/5073567/rw-file-system-single-node-writer: No such file or directory\n\n"

As you can see, the volume cannot be staged: err="rpc error: code = Internal desc = failed to stage volume: exit status 1"

Is this problem in the nomad setup or in the csi plugin? Or is this problem caused by #7754 (although, I set the id to the external id in volume.hcl)?

@tgross
Copy link
Member

tgross commented Apr 22, 2020

Hi @mkrueger-sabio! It's possible that #7754 is to blame here. It looks like mounting process is getting an error from the plugin but it's not giving us a lot of information about what went wrong. Can you pull up the allocation logs for the plugin? Often the plugin writes more useful information into its logs than it provides Nomad via the CSI API.

@mkrueger-sabio
Copy link
Author

I already posted the allocation logs for the plugin for standard output.

This is from stderr:

$ ./nomad alloc logs -stderr 8b85e4a3
E0422 13:10:02.007798       1 mount_linux.go:143] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/disk/by-id/scsi-0HC_Volume_5073567 /csi/staging/5073567/rw-file-system-single-node-writer
Output: mount: mounting /dev/disk/by-id/scsi-0HC_Volume_5073567 on /csi/staging/5073567/rw-file-system-single-node-writer failed: No such file or directory

E0422 13:10:02.038370       1 mount_linux.go:487] format of disk "/dev/disk/by-id/scsi-0HC_Volume_5073567" failed: type:("ext4") target:("/csi/staging/5073567/rw-file-system-single-node-writer") options:(["defaults"])error:(exit status 1)
E0422 13:10:33.362195       1 mount_linux.go:143] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/disk/by-id/scsi-0HC_Volume_5073567 /csi/staging/5073567/rw-file-system-single-node-writer
Output: mount: mounting /dev/disk/by-id/scsi-0HC_Volume_5073567 on /csi/staging/5073567/rw-file-system-single-node-writer failed: No such file or directory

E0422 13:10:33.367322       1 mount_linux.go:487] format of disk "/dev/disk/by-id/scsi-0HC_Volume_5073567" failed: type:("ext4") target:("/csi/staging/5073567/rw-file-system-single-node-writer") options:(["defaults"])error:(exit status 1)
E0422 13:11:34.783387       1 mount_linux.go:143] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/disk/by-id/scsi-0HC_Volume_5073567 /csi/staging/5073567/rw-file-system-single-node-writer
Output: mount: mounting /dev/disk/by-id/scsi-0HC_Volume_5073567 on /csi/staging/5073567/rw-file-system-single-node-writer failed: No such file or directory

E0422 13:11:34.790214       1 mount_linux.go:487] format of disk "/dev/disk/by-id/scsi-0HC_Volume_5073567" failed: type:("ext4") target:("/csi/staging/5073567/rw-file-system-single-node-writer") options:(["defaults"])error:(exit status 1)
E0422 13:13:36.651284       1 mount_linux.go:143] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/disk/by-id/scsi-0HC_Volume_5073567 /csi/staging/5073567/rw-file-system-single-node-writer
Output: mount: mounting /dev/disk/by-id/scsi-0HC_Volume_5073567 on /csi/staging/5073567/rw-file-system-single-node-writer failed: No such file or directory

E0422 13:13:36.656512       1 mount_linux.go:487] format of disk "/dev/disk/by-id/scsi-0HC_Volume_5073567" failed: type:("ext4") target:("/csi/staging/5073567/rw-file-system-single-node-writer") options:(["defaults"])error:(exit status 1)
E0422 13:17:38.336805       1 mount_linux.go:143] Mount failed: exit status 255
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/disk/by-id/scsi-0HC_Volume_5073567 /csi/staging/5073567/rw-file-system-single-node-writer
Output: mount: mounting /dev/disk/by-id/scsi-0HC_Volume_5073567 on /csi/staging/5073567/rw-file-system-single-node-writer failed: No such file or directory

E0422 13:17:38.339880       1 mount_linux.go:487] format of disk "/dev/disk/by-id/scsi-0HC_Volume_5073567" failed: type:("ext4") target:("/csi/staging/5073567/rw-file-system-single-node-writer") options:(["defaults"])error:(exit status 1)

@tgross
Copy link
Member

tgross commented Apr 22, 2020

It looks like we're getting to the node mounting step, so by that point the disk should be physically attached to your instance (by the monolith plugin's controller steps). When the plugin attempts to mount the volume, does /dev/disk/by-id/scsi-0HC_Volume_5073567 get created or is there a path mismatch?

@mkrueger-sabio
Copy link
Author

mkrueger-sabio commented Apr 22, 2020

As far as I can see this method is called by nomad, and this method is called by the plugin.

Yes, before I start the plugin, add the volume, ..., I detached the volume from the server instance on hcloud. Otherwise, I would get an error that the volume is already attached. When I start the httpd job the volume /dev/disk/by-id/scsi-0HC_Volume_5073567 is attached to the server instance in hcloud. So, I think that the path is correct. When I log in to the server I can mount it.

I do not understand where the target path /csi/staging/5073567/rw-file-system-single-node-writer lives. Is this a directory on the nomad server or on the server instance in hcloud? Do I have to create it?

@tgross
Copy link
Member

tgross commented Apr 22, 2020

I do not understand where the target path /csi/staging/5073567/rw-file-system-single-node-writer lives. Is this a directory on the nomad server or on the server instance in hcloud? Do I have to create it?

It's a path in the plugin's allocation, created by the Nomad client in volumeManager.ensureStagingDir. During the staging step, the plugin needs to mount the volume into its own namespace so that it can do things like format the volume (if necessary).

@mkrueger-sabio
Copy link
Author

Perhaps my prerequisites are wrong. I started the nomad server in a vagrant box. But does the nomad server must run on the hcloud server instance where the volume is attached to?

@tgross
Copy link
Member

tgross commented Apr 22, 2020

The Nomad client node needs to run on the hcloud instance where the volume is attached. The Nomad server nodes do not. (Having the server in a separate network than the client is unusual though!)

@mkrueger-sabio
Copy link
Author

mkrueger-sabio commented Apr 22, 2020

ok, then this was a misunderstanding. thx.

@github-actions
Copy link

github-actions bot commented Nov 8, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants