Start efs stunnel watch dog #104

leakingtapan · 2019-11-26T21:37:28Z

Is your feature request related to a problem? Please describe.
EFS stunnel watch dog is not started properly due to efs mount helper is installed within container environment. We need to start the watch dog to recover stunnel from crash.

Error message:

Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri"

That is because there is no proper init system present in the container. This will cause efs stunnel watch dog start to fail here. Although seems an issue, this doesn't seems to be the cause of this issue since the watch dog is never start even in the initial success mount:

bash-4.2# cat /var/log/amazon/efs/mount.log
2019-11-26 20:49:30,695 - INFO - version=1.9 options={'tls': None, 'rw': None}
2019-11-26 20:49:30,700 - WARNING - Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri"
2019-11-26 20:49:30,737 - INFO - Starting TLS tunnel: "stunnel /var/run/efs/stunnel-config.fs-e8a95a42.var.lib.kubelet.pods.390d7c5f-108e-11ea-84e4-02e886441bde.volumes.kubernetes.io~csi.efs-pv.mount.20388"
2019-11-26 20:49:30,768 - INFO - Started TLS tunnel, pid: 8083
2019-11-26 20:49:30,769 - INFO - Executing: "/sbin/mount.nfs4 127.0.0.1:/ /var/lib/kubelet/pods/390d7c5f-108e-11ea-84e4-02e886441bde/volumes/kubernetes.io~csi/efs-pv/mount -o rw,noresvport,nfsvers=4.1,retrans=2,hard,wsize=1048576,timeo=600,rsize=1048576,port=20388"
2019-11-26 20:49:31,089 - INFO - Successfully mounted fs-e8a95a42.efs.us-west-2.amazonaws.com at /var/lib/kubelet/pods/390d7c5f-108e-11ea-84e4-02e886441bde/volumes/kubernetes.io~csi/efs-pv/mount

Originally posted by @leakingtapan in #103 (comment)

The text was updated successfully, but these errors were encountered:

leakingtapan · 2019-12-09T19:25:27Z

I did some quick test by starting the amazon-efs-mount-watchdog from efs mount helper, there are several challenges when using the exiting watch dog. The watch dog is designed for a non-containerized environment where systemd or initd is required to monitor and restart the process if it crashes. And running systemd in a docker container is not trivial.

There are two ways I can think of to solve the problem:

Create a new container, efs-watch-dog, in the efs node daemonset pod. The container will start the watch dog. This approach leverages kubelet being the init system, so that the container will be restarted once crashes. However, this requires sharing PID namespace so that the watch dog can kill the stunnel processes what are in the efs-plugin namespace that are different to the watch dog process namespace. Also because watch dog is working against a shared efs state file(s) under /var/run/efs, this requires sharing of the file across efs-plugin and watch dog container. This approach could work but is a bit messy and reduces process isolation from namespace.
Manage the watch dog as a subprocess to the efs-plugin. The efs-plugin will restart the watch dog if it crashes. This approach is as secure as current container isolation and much cleaner since it doesn't require PID namespace and efs state file sharing across containers. But it requires more time to implement some process monitoring facility.

adammw · 2020-01-31T03:20:31Z

@leakingtapan I'm seeing this behaviour still on amazon/aws-efs-csi-driver@sha256:2ebe856c6fa58b63b45f011f562c25a61aca6b54f479f5eb4b636f649ea58fe0, which should have been after your fix was merged. Any ideas?

I0131 03:16:23.060907       1 node.go:50] NodePublishVolume: called with args volume_id:"fs-123456" target_path:"/var/lib/kubelet/pods/746329cb-6e15-4eee-9214-0f133d3837c5/volumes/kubernetes.io~csi/k8s-efs-test-pv/mount" volume_capability:<mount:<mount_flags:"tls" mount_flags:"iam" mount_flags:"accesspoint=fsap-987654321" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > >
I0131 03:16:23.061014       1 node.go:119] NodePublishVolume: creating dir /var/lib/kubelet/pods/746329cb-6e15-4eee-9214-0f133d3837c5/volumes/kubernetes.io~csi/k8s-efs-test-pv/mount
I0131 03:16:23.061043       1 node.go:124] NodePublishVolume: mounting fs-123456:/ at /var/lib/kubelet/pods/746329cb-6e15-4eee-9214-0f133d3837c5/volumes/kubernetes.io~csi/k8s-efs-test-pv/mount with options [tls iam accesspoint=fsap-987654321]
I0131 03:16:23.061063       1 mount_linux.go:135] Mounting cmd (mount) with arguments ([-t efs -o tls,iam,accesspoint=fsap-987654321 fs-123456:/ /var/lib/kubelet/pods/746329cb-6e15-4eee-9214-0f133d3837c5/volumes/kubernetes.io~csi/k8s-efs-test-pv/mount])
I0131 03:16:58.161943       1 reaper.go:61] Waited for child process 0
E0131 03:16:58.161952       1 mount_linux.go:140] Mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs -o tls,iam,accesspoint=fsap-987654321 fs-123456:/ /var/lib/kubelet/pods/746329cb-6e15-4eee-9214-0f133d3837c5/volumes/kubernetes.io~csi/k8s-efs-test-pv/mount
Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "aws-efs-csi-dri"
mount.nfs4: an incorrect mount option was specified
Failed to initialize TLS tunnel for fs-123456

E0131 03:16:58.162033       1 driver.go:74] GRPC error: rpc error: code = Internal desc = Could not mount "fs-123456:/" at "/var/lib/kubelet/pods/746329cb-6e15-4eee-9214-0f133d3837c5/volumes/kubernetes.io~csi/k8s-efs-test-pv/mount": mount failed: exit status 1

2020-01-31 03:29:32,359 - INFO - Executing: "/sbin/mount.nfs4 127.0.0.1:/ /var/lib/kubelet/pods/746329cb-6e15-4eee-9214-0f133d3837c5/volumes/kubernetes.io~csi/k8s-efs-test-pv/mount -o iam,rw,noresvport,nfsvers=4.1,accesspoint=fsap-987654321,retrans=2,hard,wsize=1048576,timeo=600,rsize=1048576,port=20236"
2020-01-31 03:29:33,881 - ERROR - Failed to mount fs-123456.efs.us-west-2.amazonaws.com at /var/lib/kubelet/pods/746329cb-6e15-4eee-9214-0f133d3837c5/volumes/kubernetes.io~csi/k8s-efs-test-pv/mount: returncode=32, stderr="mount.nfs4: an incorrect mount option was specified

allamand · 2021-09-17T08:25:11Z

Hello,

I have a similar issue, but trying to mount the EFS volume on Fargate

Mounting arguments: -t efs -o tls fs-4c960478:/ /var/lib/kubelet/pods/c0279054-bebb-4ffe-9432-60dabdc58fcd/volumes/kubernetes.io~csi/test/mount
Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "bash"
mount.nfs4: Connection reset by peer

AidanWenzel · 2021-09-20T21:56:19Z

@allamand I am getting the same error as you - did you find a solution?

lemmikens · 2022-01-13T22:47:06Z

Hello,

I have a similar issue, but trying to mount the EFS volume on Fargate

Mounting arguments: -t efs -o tls fs-4c960478:/ /var/lib/kubelet/pods/c0279054-bebb-4ffe-9432-60dabdc58fcd/volumes/kubernetes.io~csi/test/mount
Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "bash"
mount.nfs4: Connection reset by peer

One of two things fixed this and I'm not sure which because I did both at the same time. The first (and most likely culprit) was Creating a an IAM service account for K8s (scroll down to "Create an IAM policy and role" and look at step 2).

the second (and less likely) was by changing the SGs inside of the mount points. Literally the only change I made was the port... I had the SG open to "All Traffic" before and when I narrowed it down to the specific NFS port (2049), it seemed to work.

damiangene · 2022-01-19T08:56:18Z

Hello,
I have a similar issue, but trying to mount the EFS volume on Fargate
Mounting arguments: -t efs -o tls fs-4c960478:/ /var/lib/kubelet/pods/c0279054-bebb-4ffe-9432-60dabdc58fcd/volumes/kubernetes.io~csi/test/mount
Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "bash"
mount.nfs4: Connection reset by peer
One of two things fixed this and I'm not sure which because I did both at the same time. The first (and most likely culprit) was Creating a an IAM service account for K8s (scroll down to "Create an IAM policy and role" and look at step 2).

the second (and less likely) was by changing the SGs inside of the mount points. Literally the only change I made was the port... I had the SG open to "All Traffic" before and when I narrowed it down to the specific NFS port (2049), it seemed to work.

I was having the same issue and it was the less likely change of yours that resolved my problem. Although I will say I did the first fix you mentioned initially and then proceeded to do some testing before implementing the secondary fix.

Thank you so much for your comment I would have still been pulling out my hair if it wasn't for this comment.

balbatross · 2022-02-24T03:30:17Z

Hello,
I have a similar issue, but trying to mount the EFS volume on Fargate
Mounting arguments: -t efs -o tls fs-4c960478:/ /var/lib/kubelet/pods/c0279054-bebb-4ffe-9432-60dabdc58fcd/volumes/kubernetes.io~csi/test/mount
Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "bash"
mount.nfs4: Connection reset by peer
One of two things fixed this and I'm not sure which because I did both at the same time. The first (and most likely culprit) was Creating a an IAM service account for K8s (scroll down to "Create an IAM policy and role" and look at step 2).

the second (and less likely) was by changing the SGs inside of the mount points. Literally the only change I made was the port... I had the SG open to "All Traffic" before and when I narrowed it down to the specific NFS port (2049), it seemed to work.

If you ever start an infrastructure provider please let me know, this answer was more helpful than 8 hours of AWS documentation, Jah Bless

phyzical · 2022-11-21T07:31:06Z

@lemmikens Thanks!!! this was driving me mad

for me it was the SGs on the mount points didnt needs the sa (but im not using the csi driver)

leakingtapan added this to the 0.3 milestone Nov 26, 2019

leakingtapan mentioned this issue Dec 6, 2019

Volume mount fail after some time #103

Closed

leakingtapan mentioned this issue Dec 16, 2019

Add watch dog for efs mount with stunnel #113

Merged

leakingtapan closed this as completed in #113 Dec 30, 2019

jrtcppv mentioned this issue Aug 20, 2020

[EKS/Fargate] [Volumes]: Add support for EFS volumes to EKS Fargate Containers aws/containers-roadmap#826

Closed

nickumia-reisys mentioned this issue Jan 26, 2022

Provide Persistent Volumes for brokered services in k8s GSA/data.gov#3127

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start efs stunnel watch dog #104

Start efs stunnel watch dog #104

leakingtapan commented Nov 26, 2019 •

edited

Loading

leakingtapan commented Dec 9, 2019 •

edited

Loading

adammw commented Jan 31, 2020 •

edited

Loading

allamand commented Sep 17, 2021

AidanWenzel commented Sep 20, 2021

lemmikens commented Jan 13, 2022

damiangene commented Jan 19, 2022

balbatross commented Feb 24, 2022

phyzical commented Nov 21, 2022

Start efs stunnel watch dog #104

Start efs stunnel watch dog #104

Comments

leakingtapan commented Nov 26, 2019 • edited Loading

leakingtapan commented Dec 9, 2019 • edited Loading

adammw commented Jan 31, 2020 • edited Loading

allamand commented Sep 17, 2021

AidanWenzel commented Sep 20, 2021

lemmikens commented Jan 13, 2022

damiangene commented Jan 19, 2022

balbatross commented Feb 24, 2022

phyzical commented Nov 21, 2022

leakingtapan commented Nov 26, 2019 •

edited

Loading

leakingtapan commented Dec 9, 2019 •

edited

Loading

adammw commented Jan 31, 2020 •

edited

Loading