Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: k8sgpt fails to start with "mkdir /.config: permission denied" #440

Closed
3 of 4 tasks
jkleinlercher opened this issue May 18, 2023 · 13 comments · Fixed by #454
Closed
3 of 4 tasks

[Bug]: k8sgpt fails to start with "mkdir /.config: permission denied" #440

jkleinlercher opened this issue May 18, 2023 · 13 comments · Fixed by #454

Comments

@jkleinlercher
Copy link
Contributor

jkleinlercher commented May 18, 2023

Checklist

  • I've searched for similar issues and couldn't find anything matching
  • I've included steps to reproduce the behavior

Affected Components

K8sGPT Version

v0.3.0

Kubernetes Version

v1.24.0

Host OS and its Version

No response

Steps to reproduce

Environment: OpenShift 4.11 with SCC enabled

Steps to reproduce:

  1. install k8sgpt on K8s cluster with operator like described in https://github.com/k8sgpt-ai/k8sgpt-operator
  2. provide secret and K8sGPT CR described in https://github.com/k8sgpt-ai/k8sgpt-operator#run-the-example
  3. k8sgpt fails to start. Logs show as follows
kubectl logs k8sgpt-deployment-877c6ddd9-2cx5z
Error: mkdir /.config: permission denied

The pod runs by default with "restricted-v2" SCC (https://docs.openshift.com/container-platform/4.11/authentication/managing-security-context-constraints.html). I guess this could be the reason.

securityContext of k8sgpt pod:

  securityContext:
    fsGroup: 1000980000
    seLinuxOptions:
      level: s0:c31,c25
    seccompProfile:
      type: RuntimeDefault

and securityContext of the container:

    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsNonRoot: true
      runAsUser: 1000980000

Expected behaviour

K8sGPT should run also on OpenShift with SCC enabled.

Actual behaviour

K8sGPT fails to start on OpenShift 4.11 with SCC enabled.

Additional Information

No response

@arbreezy
Copy link
Member

hey @jkleinlercher , thanks for the detailed description of the issue.

the fsGroup: 1000980000 and runAsUser: 1000980000 are different from the non root user we are setting in our Dockerfile.

Openshift draws attention to the following use case

In certain configurations, your workloads may comply with the restricted Kubernetes definition but will not be accepted under the SCC restricted-v2 in OCP. In Kubernetes you can specify the runAsUser and get the Pod/container running in a restricted namespace. However, for Openshift’s restricted/restricted-v2 SCC you MUST leave the runAsUser field empty, or provide a value that falls within the specific user range for the namespace. Otherwise, you will only be able to run the Pod if it has access to the SCC nonroot-v2 or anyuid. If the image used requires a user, the best option is to ensure that the userID is properly defined in the image and not via the security context.

@jkleinlercher
Copy link
Contributor Author

jkleinlercher commented May 20, 2023

Thanks for your help! I just wonder what needs to be done that k8sgpt can run with restricted-v2 SCC, since it seems to be the most restricted and secured SCC. Also, each serviceaccount gets automatically assigned to restricted-v2 but not to anyuid SCC so serviceaccount k8sgpt would need a special treatment on Openshift. I am not a total expert in this field but eager to learn what is the best way to get this working on Openshift. Either we change the behavior in the dockerfile or we add some special documentation that k8sgpt SA needs a binding to the anyuid clusterrole/SCC and then we probably also need to set the runAsUser in the securityContext so that the anyuid SCC is assigned to this pod…

Also looking into https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids

Where they say:

NOTE: Using a hardcoded UID is NOT recommended. Among issues with this approach is that it is prone to UID collisions with system UIDs or with UIDs of the same or different application running in a different Namespace expecting to use the same UID.

@jkleinlercher
Copy link
Contributor Author

Also found https://sysdig.com/blog/dockerfile-best-practices/

#1.2 Don’t bind to a specific UID

@jkleinlercher
Copy link
Contributor Author

I guess I was able to fix the problem by creating a seperate emptyDir volume and point the env vars XDG_CONFIG_HOME and XDG_CACHE_HOME to this emptyDir volume mountPath. With this the container doesn't need to run with uid 65532.
Also, the container probably then can run also with readOnlyRootFilesystem enabled which would also be a security enhancement. I will do some final tests and then let you know.

@arbreezy
Copy link
Member

arbreezy commented May 22, 2023

that's great @jkleinlercher.

The are three potential solutions apart from workarounds that we have to investigate but I am really glad you adjusted it your environment.

a) create in Dockerfile the config path, including the file and give permissions to any UID
b) in a similar vain, keep the existing Dockerfile and have an init container to chown -R .config the XDG_* paths based on the metadata.uid value
c) when in server mode, k8sgpt shouldn't write or read any config file but work with in memory data -- this will require the most effort

for the time being I suggest to include your workaround in the docs if anyone has bumped into the same issue.

@arbreezy
Copy link
Member

arbreezy commented May 22, 2023

I don't believe dropping the UID in the container is ideal but what you are describing @jkleinlercher is a valid alternative to me.

@AlexsJones any thoughts on that subject?

@jkleinlercher
Copy link
Contributor Author

dropping the UID in the dockerfile is not needed. Dockerfile can be used without modification. The only thing is the container can be started also with different UID because writing files is now done in an emptyDir volume with the approach in #440 (comment) which is nevertheless a good practice because then you can enable readOnlyRootFilesystem.

@jkleinlercher
Copy link
Contributor Author

However, because the K8sgpt operator is responsible for creating the deployment ressource, the operator needs to get extended to create an emptyDir volume and set the XDG_ path env variables.

@arbreezy
Copy link
Member

Yes agreed,
I assumed you were suggesting to remove the USER from Dockerfile -- scratch that :)

@AlexsJones
Copy link
Member

However, because the K8sgpt operator is responsible for creating the deployment ressource, the operator needs to get extended to create an emptyDir volume and set the XDG_ path env variables.

Joining this conversation a little late but as I understand it the tasks are

  • Add an emptyDirVolume mount under $WHOAMI/.config/
  • set the XDG_CONFIG_HOME env to point to the $WHOAMI/ volume?

@arbreezy
Copy link
Member

yes, I think that's a better approach than chown directories and with zero code changes in k8sgpt. I can't think of any drawback having this in the operator's K8sgpt Deployment spec.

@jkleinlercher
Copy link
Contributor Author

jkleinlercher commented May 23, 2023

However, because the K8sgpt operator is responsible for creating the deployment ressource, the operator needs to get extended to create an emptyDir volume and set the XDG_ path env variables.

Joining this conversation a little late but as I understand it the tasks are

  • Add an emptyDirVolume mount under $WHOAMI/.config/
  • set the XDG_CONFIG_HOME env to point to the $WHOAMI/ volume?

I created #454 to show you the things I changed in the deployment. this PR has only the changes in the helm chart in this repo. If you are fine with this changes I would be happy to create also a PR in the operator repo.

@AlexsJones
Copy link
Member

However, because the K8sgpt operator is responsible for creating the deployment ressource, the operator needs to get extended to create an emptyDir volume and set the XDG_ path env variables.

Joining this conversation a little late but as I understand it the tasks are

  • Add an emptyDirVolume mount under $WHOAMI/.config/
  • set the XDG_CONFIG_HOME env to point to the $WHOAMI/ volume?

I created #454 to show you the things I changed in the deployment. this PR has only the changes in the helm chart in this repo. If you are fine with this changes I would be happy to create also a PR in the operator repo.

Sounds good to me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants