Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to mount two access points on same EFS volume #167

Closed
2uasimojo opened this issue May 20, 2020 · 7 comments · Fixed by #185
Closed

Unable to mount two access points on same EFS volume #167

2uasimojo opened this issue May 20, 2020 · 7 comments · Fixed by #185
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@2uasimojo
Copy link
Contributor

2uasimojo commented May 20, 2020

/kind bug

This is just like #100, but for access points.

What happened?
Pod fails to come up, eventually timing out waiting for mounts. Logs show only one mount was attempted.

What you expected to happen?
All file systems mounted; pod comes up.

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:

  • The document referred to above states that this setup is supported. (That's my bad.) It needs to be fixed until/unless this bug is resolved. I'll put up a PR for that in the next few minutes.
  • If you use access points on different file systems, it works fine.
  • The solution (Update to read subpath from volume handle #102) for Unable to mount two folders on same EFS volume #100 was to glom the subdirectory into the volumeHandle, separated by a colon. It is not out of the question that a similar fix could be implemented here. For example, support any of:
    • volumeHandle: fs-XXX
    • volumeHandle: fs-XXX:/path
    • volumeHandle: fs-XXX:fsap-YYY <== new
      The access point would be fed into the mount options by the driver, rather than having to specify it explicitly under mountOptions in the PV spec. (Note that I don't think fs-XXX:fsap-YYY:/path makes sense because the access point is already subpathed, but it could be looked at.)

Environment

  • Kubernetes version (use kubectl version):
Client Version: openshift-clients-4.3-31-gc25fb9c7
Server Version: 4.3.19
Kubernetes Version: v1.16.2
  • Driver version:
    latest (as of 20200520)
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 20, 2020
2uasimojo added a commit to 2uasimojo/aws-efs-csi-driver that referenced this issue May 20, 2020
Per [issue kubernetes-sigs#167](kubernetes-sigs#167),
it turns out that you *can't* use separate access points to the same EFS
volume from the same pod. Correct the documentation pending a fix.
2uasimojo added a commit to 2uasimojo/aws-efs-csi-driver that referenced this issue May 20, 2020
Per [issue kubernetes-sigs#167](kubernetes-sigs#167),
it turns out that you *can't* use separate access points to the same EFS
volume from the same pod. Correct the documentation pending a fix.
@2uasimojo
Copy link
Contributor Author

  • The document referred to above states that this setup is supported. (That's my bad.) It needs to be fixed until/unless this bug is resolved. I'll put up a PR for that in the next few minutes.

#168 ⬆️

2uasimojo added a commit to 2uasimojo/aws-efs-operator that referenced this issue May 20, 2020
Investigation unveiled the bug reported in
[driver issue #167](kubernetes-sigs/aws-efs-csi-driver#167)
whereby you can't use multiple access points backed by the same EFS volume from
within the same pod. This commit updates the README, TESTING, and TODO docs
accordingly.
@leakingtapan
Copy link
Contributor

Yes, the access point ID has to be part of volumeHandle to make kubelet aware of there is a different volume to be mounted instead of the mounting the same volume twice (which will be ignored by kubelet).

We need to define a proper volumeHandle semantic for both use cases:

  • just the filesystem ID
  • filesystem ID + subpath
  • filesystem ID + access point

@wongma7
Copy link
Contributor

wongma7 commented May 26, 2020

(Note that I don't think fs-XXX:fsap-YYY:/path makes sense because the access point is already subpathed, but it could be looked at.)

I think it makes sense. An accesspoint should be treated like its own "root." If I have an accesspoint fsap-AAA at directory /a in my EFS then I want to be able to do mount fs-XXX:/b -o accesspoint=fsap-AAA, equivalent to doing mount fs-asdf:/a/b.

It's equivalent also to mount fs-XXX:b -o accesspoint=fsap-AAA...i.e. the path doesn't need to have a leading slash.

So I think volumeHandle is going to need another colon delimited token/field :/.

(TBH I think the proper long-term fix would be on the CSI/kubernetes side...I just read #100 (comment) and it seems wrong that CSI drivers have lost the ability to implement GetVolumeName compared to in-tree drivers, forcing CSI drivers to stuff fields into volumeHandle rather than build out volumeAttributes.

@2uasimojo
Copy link
Contributor Author

(Note that I don't think fs-XXX:fsap-YYY:/path makes sense because the access point is already subpathed, but it could be looked at.)

I think it makes sense. An accesspoint should be treated like its own "root." If I have an accesspoint fsap-AAA at directory /a in my EFS then I want to be able to do mount fs-XXX:/b -o accesspoint=fsap-AAA, equivalent to doing mount fs-asdf:/a/b.

It's equivalent also to mount fs-XXX:b -o accesspoint=fsap-AAA...i.e. the path doesn't need to have a leading slash.

I was afraid of that :( It's going to make it harder to make this backward compatible with the existing two-field syntax, because when we see a two-field input, we wouldn't know whether the second field is a path or an access point ID.

I suppose this means we have to require all three fields if you want to use access points, even if the path component is empty (as I suspect it would be most of the time).

So how about this for a scratch design:

  • One field is always interpreted as {fsid}. E.g. fs-abc123def.
  • Two fields is always interpreted as {fsid}:{path}. E.g.
    • fs-abc123def:/foo (path has explicit leading slash)
    • fs-abc123def:foo (no leading slash) (is this legitimate?)
    • fs-abc123def: (empty path), treated same as single field fs-abc123def (do we allow this?)
  • Three fields is always interpreted as {fsid}:{apid}:{path}. E.g.
    • fs-abc123def:fsap-001122334455: (common access point case, no subpath)
    • fs-abc123def:fsap-001122334455:/foo (subpath with explicit slash)
    • fs-abc123def:fsap-001122334455:foo (subpath with no slash)
    • fs-abc123def:: (no access point or path, same as single field fs-abc123def)
    • fs-abc123def::/foo (no access point, same as two-field fs-abc123def:/foo, path can have slashes or not)
  • The only case we'll pre-check and fail on is if the {fsid} is empty. That goes for the one, two, or three field syntax:
    • "" (empty string)
    • :/foo
    • ::/foo
    • :fsap-001122334455:
    • :fsap-001122334455:/foo
    • etc.
      Otherwise we'll just let the mount command fail.

(TBH I think the proper long-term fix would be on the CSI/kubernetes side...I just read #100 (comment) and it seems wrong that CSI drivers have lost the ability to implement GetVolumeName compared to in-tree drivers, forcing CSI drivers to stuff fields into volumeHandle rather than build out volumeAttributes.

I agree wholeheartedly.

@wongma7
Copy link
Contributor

wongma7 commented May 28, 2020

I like your design, one simplification could be to keep {path} always the second field, i.e. be super strict about path and apid as "positional args", so whether it's two fields or three we can always assume we will get some variation of {fsid}:{path}:{apid} where everything past {fsid} is optional.

@2uasimojo
Copy link
Contributor Author

2uasimojo commented May 28, 2020

Yup, I like that. I was trying to avoid the common access point case having the empty field in the middle ({fsid}::{apid}) but I think the simplicity of the ordering strictness is worth the tradeoff.

@2uasimojo
Copy link
Contributor Author

I'll start attacking this today.

2uasimojo added a commit to 2uasimojo/aws-efs-csi-driver that referenced this issue Jun 16, 2020
WIP: unit test
WIP: docs

Expands the supported `volumeHandle` formats to include a three-field
version: `{fsid}:{subpath}:{apid}`. This addresses the limitation
originally described in kubernetes-sigs#100 whereby k8s relies solely on the
`volumeHandle` to uniquely distinguish one PV from another.

As part of this fix, specifying `accesspoint=fsap-...` in `mountOptions`
is deprecated.

For more details, see the related issue (kubernetes-sigs#167).

The following scenarios were tested in a live environment:

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap2']`
- expect: fail
- actual: fail with `Warning  FailedMount  1s (x4 over 4s)  kubelet, ip-10-0-137-122.ec2.internal  MountVolume.SetUp failed for volume "pv-aptest-1" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = InvalidArgument desc = Found conflicting access point IDs in mountOptions (fsap-04d3307beebd04739) and volumeHandle (fsap-057e9a9b209ec9531)`
- result: ✔

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap1']`
- expect: success
- result: ✔

Also makes sure we populate tls and accesspoint in mountOptions

- `mountOptions: []` (for both)
- PV1:
  - `volumeHandle: fs1::ap1`
- PV2:
  - `volumeHandle: fs1::ap2`
- expect: success, both mounts accessible and distinct
- result: ✔

- `mountOptions: []` (for all)
- PV1:
  - `volumeHandle: fs1::ap1` (root -- should be able to see /foo and bar)
- PV2:
  - `volumeHandle: fs1:/foo/bar:ap1` (absolute path)
- PV3:
  - `volumeHandle: fs1:foo/bar:ap1` (relative path)
- expect: success
- actual: success (had to create `$absolutemountpoint/foo/bar` in the fs first, as expected)
- result: ✔

Fixes: kubernetes-sigs#167

Signed-off-by: Eric Fried <efried@redhat.com>
2uasimojo added a commit to 2uasimojo/aws-efs-csi-driver that referenced this issue Jun 16, 2020
WIP: unit test
WIP: docs

Expands the supported `volumeHandle` formats to include a three-field
version: `{fsid}:{subpath}:{apid}`. This addresses the limitation
originally described in kubernetes-sigs#100 whereby k8s relies solely on the
`volumeHandle` to uniquely distinguish one PV from another.

As part of this fix, specifying `accesspoint=fsap-...` in `mountOptions`
is deprecated.

For more details, see the related issue (kubernetes-sigs#167).

The following scenarios were tested in a live environment:

**Conflicting access point in volumeHandle and mountOptions**

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap2']`
- expect: fail
- actual: fail with `Warning  FailedMount  1s (x4 over 4s)  kubelet, ip-10-0-137-122.ec2.internal  MountVolume.SetUp failed for volume "pv-aptest-1" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = InvalidArgument desc = Found conflicting access point IDs in mountOptions (fsap-04d3307beebd04739) and volumeHandle (fsap-057e9a9b209ec9531)`
- result: ✔

**Same access point in volumeHandle and mountOptions**

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap1']`
- expect: success
- result: ✔

**Two access points on the same file system**

Also makes sure we populate tls and accesspoint in mountOptions

- `mountOptions: []` (for both)
- PV1:
  - `volumeHandle: fs1::ap1`
- PV2:
  - `volumeHandle: fs1::ap2`
- expect: success, both mounts accessible and distinct
- result: ✔

**Subpaths with access points**

- `mountOptions: []` (for all)
- PV1:
  - `volumeHandle: fs1::ap1` (root -- should be able to see /foo and bar)
- PV2:
  - `volumeHandle: fs1:/foo/bar:ap1` (absolute path)
- PV3:
  - `volumeHandle: fs1:foo/bar:ap1` (relative path)
- expect: success
- actual: success (had to create `$absolutemountpoint/foo/bar` in the fs first, as expected)
- result: ✔

Fixes: kubernetes-sigs#167

Signed-off-by: Eric Fried <efried@redhat.com>
2uasimojo added a commit to 2uasimojo/aws-efs-csi-driver that referenced this issue Jun 16, 2020
WIP: unit test
WIP: docs

Expands the supported `volumeHandle` formats to include a three-field
version: `{fsid}:{subpath}:{apid}`. This addresses the limitation
originally described in kubernetes-sigs#100 whereby k8s relies solely on the
`volumeHandle` to uniquely distinguish one PV from another.

As part of this fix, specifying `accesspoint=fsap-...` in `mountOptions`
is deprecated.

For more details, see the related issue (kubernetes-sigs#167).

The following scenarios were tested in a live environment:

**Conflicting access point in volumeHandle and mountOptions**

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap2']`
- expect: fail
- actual: fail with `Warning  FailedMount  1s (x4 over 4s)  kubelet, ip-10-0-137-122.ec2.internal  MountVolume.SetUp failed for volume "pv-aptest-1" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = InvalidArgument desc = Found conflicting access point IDs in mountOptions (fsap-04d3307beebd04739) and volumeHandle (fsap-057e9a9b209ec9531)`
- result: ✔

**Same access point in volumeHandle and mountOptions**

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap1']`
- expect: success
- result: ✔

**Two access points on the same file system**

Also makes sure we populate tls and accesspoint in mountOptions

- `mountOptions: []` (for both)
- PV1:
  - `volumeHandle: fs1::ap1`
- PV2:
  - `volumeHandle: fs1::ap2`
- expect: success, both mounts accessible and distinct
- result: ✔

**Subpaths with access points**

- `mountOptions: []` (for all)
- PV1:
  - `volumeHandle: fs1::ap1` (root -- should be able to see /foo and bar)
- PV2:
  - `volumeHandle: fs1:/foo/bar:ap1` (absolute path)
- PV3:
  - `volumeHandle: fs1:foo/bar:ap1` (relative path)
- expect: success
- actual: success (had to create `$absolutemountpoint/foo/bar` in the fs first, as expected)
- result: ✔

Signed-off-by: Eric Fried <efried@redhat.com>
2uasimojo added a commit to 2uasimojo/aws-efs-csi-driver that referenced this issue Jun 18, 2020
Expands the supported `volumeHandle` formats to include a three-field
version: `{fsid}:{subpath}:{apid}`. This addresses the limitation
originally described in kubernetes-sigs#100 whereby k8s relies solely on the
`volumeHandle` to uniquely distinguish one PV from another.

As part of this fix, specifying `accesspoint=fsap-...` in `mountOptions`
is deprecated.

For more details, see the related issue (kubernetes-sigs#167).

The following scenarios were tested in a live environment:

**Conflicting access point in volumeHandle and mountOptions**

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap2']`
- expect: fail
- actual: fail with `Warning  FailedMount  1s (x4 over 4s)  kubelet, ip-10-0-137-122.ec2.internal  MountVolume.SetUp failed for volume "pv-aptest-1" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = InvalidArgument desc = Found conflicting access point IDs in mountOptions (fsap-04d3307beebd04739) and volumeHandle (fsap-057e9a9b209ec9531)`
- result: ✔

**Same access point in volumeHandle and mountOptions**

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap1']`
- expect: success
- result: ✔

**Two access points on the same file system**

Also makes sure we populate tls and accesspoint in mountOptions

- `mountOptions: []` (for both)
- PV1:
  - `volumeHandle: fs1::ap1`
- PV2:
  - `volumeHandle: fs1::ap2`
- expect: success, both mounts accessible and distinct
- result: ✔

**Subpaths with access points**

- `mountOptions: []` (for all)
- PV1:
  - `volumeHandle: fs1::ap1` (root -- should be able to see /foo and bar)
- PV2:
  - `volumeHandle: fs1:/foo/bar:ap1` (absolute path)
- PV3:
  - `volumeHandle: fs1:foo/bar:ap1` (relative path)
- expect: success
- actual: success (had to create `$absolutemountpoint/foo/bar` in the fs first, as expected)
- result: ✔

Signed-off-by: Eric Fried <efried@redhat.com>
2uasimojo added a commit to 2uasimojo/aws-efs-csi-driver that referenced this issue Jun 18, 2020
Expands the supported `volumeHandle` formats to include a three-field
version: `{fsid}:{subpath}:{apid}`. This addresses the limitation
originally described in kubernetes-sigs#100 whereby k8s relies solely on the
`volumeHandle` to uniquely distinguish one PV from another.

As part of this fix, specifying `accesspoint=fsap-...` in `mountOptions`
is deprecated.

For more details, see the related issue (kubernetes-sigs#167).

The following scenarios were tested in a live environment:

**Conflicting access point in volumeHandle and mountOptions**

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap2']`
- expect: fail
- actual: fail with `Warning  FailedMount  1s (x4 over 4s)  kubelet, ip-10-0-137-122.ec2.internal  MountVolume.SetUp failed for volume "pv-aptest-1" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = InvalidArgument desc = Found conflicting access point IDs in mountOptions (fsap-04d3307beebd04739) and volumeHandle (fsap-057e9a9b209ec9531)`
- result: ✔

**Same access point in volumeHandle and mountOptions**

- `volumeHandle: fs::ap1`
- `mountOptions: ['tls', 'accesspoint=ap1']`
- expect: success
- result: ✔

**Two access points on the same file system**

Also makes sure we populate tls and accesspoint in mountOptions

- `mountOptions: []` (for both)
- PV1:
  - `volumeHandle: fs1::ap1`
- PV2:
  - `volumeHandle: fs1::ap2`
- expect: success, both mounts accessible and distinct
- result: ✔

**Subpaths with access points**

- `mountOptions: []` (for all)
- PV1:
  - `volumeHandle: fs1::ap1` (root -- should be able to see /foo and bar)
- PV2:
  - `volumeHandle: fs1:/foo/bar:ap1` (absolute path)
- PV3:
  - `volumeHandle: fs1:foo/bar:ap1` (relative path)
- expect: success
- actual: success (had to create `$absolutemountpoint/foo/bar` in the fs first, as expected)
- result: ✔

Signed-off-by: Eric Fried <efried@redhat.com>
2uasimojo added a commit to 2uasimojo/aws-efs-operator that referenced this issue Jun 23, 2020
Since the [fix](kubernetes-sigs/aws-efs-csi-driver#185) for [issue #167](kubernetes-sigs/aws-efs-csi-driver#167) merged, the AWS EFS CSI driver overwrote the [`latest` image tag](sha256-962619a5deb34e1c4257f2120dd941ab026fc96adde003e27f70b65023af5a07?context=explore) to include it.

For starters, this means we can update this operator to use the [new method of specifying access points](https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/0ae998c5a95fe6dbee7f43c182997e64872695e6/examples/kubernetes/access_points#edit-persistent-volume-spec) via a colon-delimited `volumeHandle` as opposed to in `mountOptions`.

However, the same update to `latest` also brought in a [commit](kubernetes-sigs/aws-efs-csi-driver@b3baff8) that requires an additional mount in the `efs-plugin` container of the DaemonSet. So we need to update our YAML for that resource at the same time, or everything is broken (this might be upstream [issue #192](kubernetes-sigs/aws-efs-csi-driver#192). This update to the DaemonSet YAML also syncs with [upstream](https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/0ae998c5a95fe6dbee7f43c182997e64872695e6/deploy/kubernetes/base/node.yaml) by bumping the image versions for the other two containers (csi-node-driver-registrar: v1.1.0 => v1.3.0; and livenessprobe: v1.1.0 => livenessprobe:v2.0.0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants