Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting devworkspace failed - Cannot remove folder projects/plugins: Permission denied #20965

Closed
nils-mosbach opened this issue Dec 22, 2021 · 9 comments
Labels
area/devworkspace-operator kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.

Comments

@nils-mosbach
Copy link

nils-mosbach commented Dec 22, 2021

Describe the bug

Devworkspace operator seems to fail deleting devworkspaces due to volume permissions.

image

$ kubectl get devworkspaces

NAME               DEVWORKSPACE ID             PHASE     INFO
nodejs-rtdc-mzpk   workspace1ead20c9b44d4bac   Error     DevWorkspace PVC cleanup job failed: see logs for job "cleanup-workspace1ead20c9b44d4bac" for details
$ kubectl logs job.batch/cleanup-workspace1ead20c9b44d4bac

Found 4 pods, using pod/cleanup-workspace1ead20c9b44d4bac-wxsnw
rm: cannot remove '/tmp/devworkspaces/workspace1ead20c9b44d4bac/projects': Permission denied
rm: cannot remove '/tmp/devworkspaces/workspace1ead20c9b44d4bac/plugins': Permission denied

Che version

7.40@latest

Steps to reproduce

Start a new devfile 2 workspace with devworkspaces enabled.
Delete workspace using dashboard.

Expected behavior

Workspace gets deleted.

Runtime

Kubernetes (vanilla)

Screenshots

No response

Installation method

chectl/latest

Environment

Linux

Eclipse Che Logs

No response

Additional context

Had to remove finalizer from devworkspace cr in order to get everything deleted.

@nils-mosbach nils-mosbach added the kind/bug Outline of a bug - must adhere to the bug report template. label Dec 22, 2021
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Dec 22, 2021
@mmorhun mmorhun added area/devworkspace-operator severity/P1 Has a major impact to usage or development of the system. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Dec 22, 2021
@amisevsk
Copy link
Contributor

amisevsk commented Dec 22, 2021

Hi @nils-mosbach, thanks for your report! From your comment in #20963, it sounds like this is for a bare-metal cluster -- is that correct?

I know bare-metal clusters have caused issues in the past for Che. If you have the reproducer still running, could you inspect the filesystem in the workspace PVC? One easy way to do this is (in the namespace where the failed workspace was running

  1. Stop all running workspaces (if any)

  2. Create a pod that mounts the PVC used as storage for workspaces:

    cat <<EOF | kubectl apply -f -
    apiVersion: v1        
    kind: Pod
    metadata:
      name: "cleanup"
      labels:
        app: cleanup
    spec:
      containers:
        - image: registry.access.redhat.com/ubi8-micro:8.4-81
          name: "cleanup"
          volumeMounts:
            - mountPath: /tmp/claim-devworkspace
              name: claim-devworkspace
          command: ["tail"]
          args: ["-f", "/dev/null"]
          resources:
            limits:
              memory: 100Mi
      volumes:
        - name: claim-devworkspace
          persistentVolumeClaim:
            claimName: claim-devworkspace
    EOF

    (feel free to switch the image, as long as we can get a bash shell. The image above is the one used in the cleanup job that is failing)

  3. Once the pod is running, exec into it and poke around: kubectl exec -it cleanup -- /bin/bash. The PVC storing workspace files is mounted at /tmp/claim-devworkspace

  4. Look into the permissions on the folders in question: ls -al /tmp/claim-devworkspace/workspace1ead20c9b44d4bac (you might need to switch the workspace<letters-numbers> part depending on the namespace -- each workspace gets a unique folder.

Edit to add: could you also list the user info when running the cleanup pod? From within the bash shell in step 3, run id and share the output

@nils-mosbach
Copy link
Author

nils-mosbach commented Dec 22, 2021

sure. Yes that's correct we are running a 3 node cluster provisioned using Rancher. Storage also comes tricky if you run everything on your own. At the beginning we experimented quite a lot with several storage solutions. What worked best from a performance and ease of use standpoint was linbit Linstor which is a distributed block storage implementation on top of drbd. What we have now are 27 SSDs (9 each) and a dedicated network for async storage replication.

$ id

uid=0(root) gid=0(root) groups=0(root)
$ ls -al /tmp/claim-devworkspace/workspace1ead20c9b44d4bac/

total 16
drwxr-xr-x 4 root root 4096 Dec 22 13:50 .
drwxr-xr-x 8 root root 4096 Dec 22 13:51 ..
drwxr-xr-x 2 root root 4096 Dec 22 13:50 plugins
drwxr-xr-x 2 root root 4096 Dec 22 13:50 projects

I've also tried deleting the 2 folders in the pod rm -rf /tmp/claim-devworkspace/workspace1ead20c9b44d4bac/plugins which worked. No permission denied error. I know there could be issues, specially if using host mounted volumes. Luckily we haven't had issues before using Linstor as a distributed block storage solution.

@amisevsk
Copy link
Contributor

Interesting -- those directories are automatically created by Kubernetes when they're mounted. For example, the projects directory comes from

volumeMounts:
- mountPath: /projects
  name: claim-devworkspace
  subPath: workspace30ea739d376b4f71/projects

To process this volumeMount, Kubernetes creates the path workspace30ea739d376b4f71/projects/ in the PVC and mounts that into the pod. In most cases, at least the leaf directory gets created with 777 permissions to ensure whatever container is running can use the volume.

From the cleanup pod, are there any files/directories inside /tmp/claim-devworkspace/workspace1ead20c9b44d4bac/projects? On Kubernetes (with DevWorkspaces enabled) we run everything as uid 1234 and gid 0, so projects may not have been cloned in the first place (you can also check the logs of the project-clone container in the workspace1ead20c9b44d4bac-xxxxx-xxxxx pod before the workspace is deleted).

The reason this issue isn't present when DevWorkspaces are disabled is that Che has a workaround for a different issue (see this ancient discussion). With DevWorkspaces enabled, we work around it in a different way.

I'm not familiar with Linstor, but is it possible to configure how these paths are created?

The documentation also mentions that it supports fsGroup, which may workaround the problem (but I'd need to consider impacts in other configurations). I've created issue devfile/devworkspace-operator#718 which may help the situation, but I can't guarantee the suggestion there would resolve all problems here.

@nils-mosbach
Copy link
Author

nils-mosbach commented Dec 23, 2021

Your're right. Project and Plugin directories are empty. I guess setting fsGroup to 1234 will solve the issue in our case.


Devfile 1 workspaces that are provisioned by che server contain a security context that sets fsGroup in our case.

apiVersion: apps/v1
kind: Deployment
metadata:
  ...
  labels:
    che.original_name: workspace
    che.workspace_id: workspaceqhfe30ptcyxjzhm2
  name: workspaceqhfe30ptcyxjzhm2.workspace
  namespace: dev-studio-workspace-mosbachn
  resourceVersion: "293281914"
  uid: 67ac129b-feb1-41d0-804a-2ea332bf967e
spec:
 template:
    spec:
      securityContext:
        fsGroup: 1724
        runAsUser: 1724
...

I tried to validate this by manually setting the value using kubectl. I stopped DevWorkspace-Controller and manually set the fsGroup to 1234 on the workspace pod. Theia loads as expected and projects sources can be cloned.

...
      securityContext:
        fsGroup: 1234
        runAsGroup: 0
        runAsNonRoot: true
        runAsUser: 1234

I can't check wether that solves deleting workspaces, since devworkspace operator will "correct" my change during reconcillation but theia loads as expected. (so this might also solve #20963 as you'd expected.)

@amisevsk
Copy link
Contributor

amisevsk commented Dec 24, 2021

@nils-mosbach That's great to hear! I can look into doing the same as we do for Devfile 1 workspaces in the new year -- I'll be on holiday for a bit.

The change to make would happen here in case you need a fix urgently

@amisevsk
Copy link
Contributor

PR devfile/devworkspace-operator#748 is open now, which sets fsGroup to match runAsUser by default. This should resolve this issue once it works its way into a release.

@nils-mosbach
Copy link
Author

nils-mosbach commented Jan 25, 2022

Next branch of the devworkspace operator solves the issue in my case. Thanks!

@martinelli-francesco
Copy link

I have the same issue with Eclipse Che installed on ews.
When will this fix be applied to the version of Eclipse Che installed with chectl? (stable or next channel)

@amisevsk
Copy link
Contributor

amisevsk commented Jan 28, 2022

@martinelli-francesco I'm not sure which catalogsource chectl is using, but the fix should be released to all platforms in a week or two (depending on release pipeline). If it's needed sooner, it's already available in the upstream DevWorkspace Operator catalog quay.io/devfile/devworkspace-operator-index:release but using this catalog enters less-supported territory.

On Kubernetes, I'm not sure, but you're looking for DevWorkspace Operator v0.12.0 and higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/devworkspace-operator kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

5 participants