-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chown/chgrp on dynamic provisioned pvc fails #300
Comments
Similar to the option here: travelaudience/docker-nexus#33. This maybe required to address this issue: kubernetes-sigs/aws-efs-csi-driver#300.
Similar to what's done here: travelaudience/docker-nexus#33. This maybe required to address this issue: kubernetes-sigs/aws-efs-csi-driver#300.
Similar to what's done here: travelaudience/docker-nexus#33. This maybe required to address this issue: kubernetes-sigs/aws-efs-csi-driver#300. Signed-off-by: Gazal K <mohamed.gazal@target.com.au>
I'm also seeing this. This means that things like postgres, which fails to start if its data directory is not owned by the From https://docs.aws.amazon.com/efs/latest/ug/accessing-fs-nfs-permissions.html,
I can reproduce this without a file system policy, and with a file system policy that grants |
Thanks for reporting this. We're investigating possible fixes, but in the meantime let me explain the reason this is happening. Dynamic provisioning shares a single file system among multiple PVs by using EFS Access Points. The way Access Points work is they allow server-side overwrites of user/group information, overriding whatever the user/group of the app/container is. When we create an AP with dynamic provisioning we allocate a unique UID/GID (for instance 50000:50000) to overwrite all operations to, and create a unique directory (e.g. /ap50000) that is owned by that user/group. This ensures that no matter how the container is configured it has read/write access to its root directory. What is happening in this case is the application is trying to take its own steps to make its root directory writeable. For instance, if in the container there is an application user with UID/GID 100:100, when it does a ls -la on its FS root directory it sees that it is owned by 50000:50000, not 100:100, so it assumes it needs to do a chown/chmod for it to work. However, even if we allowed this command to go through the application would lose access to its own volume. This is why the original issue above was resolved by disabling the chown/chgrp checks. This method can be used as a workaround for any application, since you can trust the PVs to be writeable out of box. |
Some applications don't support disabling ownership checks. E.g. I'm not aware of any way to disable it in Postgres. In such cases, the only workaround I've found is to create a user for the UID assigned by the driver (something like |
Part of the reason efs-provisioner worked seamlessly is it relied on a beta annotation "pv.beta.kubernetes.io/gid" https://github.com/kubernetes-retired/external-storage/blob/201f40d78a9d3fd57d8a441cfc326988d88f35ec/nfs/pkg/volume/provision.go#L62 that silently does basically what your workaround does: it ensures that the Pod using the PV has the annotated group in its supplemental groups (i.e. if the annotation says '100' and you execute This feature is very old and predates the rigorous KEP/feature tracking system that exists today, and I think it's been forgotten by sig-storage. Certainly i am culpable for relying on it but doing nothing to make it more than a beta annotation, I'll try to bring it up in isg-storage and see if we can rely on it as an alternative solution. |
Does EFS still reject the chown call when using efs-provisioner, or is it that applications are expected not to call If I'm reading the Postgres code correctly, it seems to check the UID of the directory. So I'm not sure supplemental groups would work for all use cases. |
It's the latter, so you are right, the supplemental group approach won't work in postgres case. Does a subdirectory of the access point have the same restriction on chown'ing? If not, I suppose one could create the subdirectory before starting the applicatoin, maybe with an initContainer, and then let the application chown it. Still not a very elegant workaround tho. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
Can someone please suggest a workaround for postgres chown issue when using EFS with dynamic provisioning via access points? (I honestly hope I don't have to fallback to static provisioning to make postgres work on EKS!) |
@gabegorelick - if I setup runAsUser and runAsGroup in pod security context then postgres pod fails with a FATAL error
Here is the kubernetes manifest I am using
|
This issue probably needs to be addressed, but I'm not sure about using EFS for postgres. I have seen less demanding storage use cases where EFS struggles. |
I used #434. That's still not merged though, so you'll have to run a fork if you want it. But that is the only way that I'm aware of to specify a UID for the provisioned volumes.
One workaround is to not do this. Instead, use |
@gabegorelick can you please guide me on what do I need to change the entrypoint command of container from cmd [postgres] to? |
I see the PR fixing this is still not merged. I'm not running postgres but trying to run freshclam with EFS storage for the signatures database. In the entrypoint they |
I would love to see this Pr merged in. Currently hitting this issue while trying to run docker:dind with a pv |
i am also having same issue with some of my applications |
Ran into this issue, as gabegorelick mention, only got it to work by chowning the mount path with the UID and GUID of the dynamic volume and then setting postgres user to the same IDs. Not ideal, but it works... Helm code:
|
How can this be closed? I understand there are not enough resources to work on it now but it seems quite severe so it ought to be kept in the backlog rather than just be closed. |
/reopen |
@RyanStan: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@mskanth972: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I hate this bot. /reopen |
@z0rc: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
So? Any solution? I can't install rancher monitoring, which uses grafana and persistence stack on EFS, because of this error Grafana deployment, that is supplied by rancher tries to execute chown and fails:
|
I am also experiencing the same issue while using EKS and the EFS CSI Driver.... |
/reopen |
@aschaber1: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
please reopen it as still facing same issue efs, |
/reopen |
@davidgiffin: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Did you find a solution, I am running into this issue also |
For anybody else facing this issue, please see this comment to understand why this is happening. The EFS CSI driver does implement the storage interface itself quite well. It's just that some applications expect to be able to then run commands like |
Using a fixed groupid works for me: |
As of workaround works for me : command: ["bash", "-c"]
args: ["usermod -u $(stat -c '%u' '/var/lib/postgresql/data') postgres && \
groupmod -g $(stat -c '%u' '/var/lib/postgresql/data') postgres && \
chown -R postgres:postgres /var/lib/postgresql && \
/usr/local/bin/docker-entrypoint.sh postgres"] |
/kind bug
What happened?
Using dynamic provisioning feature introduced by #274 with applications that try to
chown
their pv fails.What you expected to happen?
With the old efs-provisioner, this caused no issues.
But with dynamic provisioning in this csi driver, the chown command fails. I must admit, I don't understand how the uid/gid thing works with EFS access points. The pod user does not seem to have any association with the uid/gid on the access point, however pods can read & write mounted pv just fine.
How to reproduce it (as minimally and precisely as possible)?
For the first chart, we saw the initContainer failing. We just tried disabling the initContainer on the grafana chart with these chart overrides:
And the application worked fine.
For the nexus chart, there's a whole bunch of errors from the logs,
chgrp: changing group of '/nexus-data/elasticsearch/nexus': Operation not permitted
Perhaps these charts don't really need that step with dynamic provisioning. Another nexus chart seems to have an option to skip the
chown
step; https://github.com/travelaudience/docker-nexus/blob/a86261e35734ae514c0236e8f371402e2ea0feec/run#L3-L6Anything else we need to know?:
Environment
kubectl version
):The text was updated successfully, but these errors were encountered: