Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chroot missing /dev/stdout; permission failures trying to write during nginx.conf validation #8572

Closed
gcolten opened this issue May 10, 2022 · 12 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@gcolten
Copy link

gcolten commented May 10, 2022

NGINX Ingress controller version:

/ $ ./nginx-ingress-controller --version
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.2.0
  Build:         git-6d9a39eda
  Repository:    https://github.com/kubernetes/ingress-nginx.git
  nginx version: nginx/1.19.10

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6", GitCommit:"f59f5c2fda36e4036b49ec027e556a15456108f0", GitTreeState:"clean", BuildDate:"2022-01-19T17:33:06Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.7", GitCommit:"a326522ffdc578d1ac5c14cf8d0160feda9b13fc", GitTreeState:"clean", BuildDate:"2022-04-21T07:40:45Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: Azure (AKS)
  • OS (e.g. from /etc/os-release): Ubuntu 18.04.6 LTS
  • Kernel (e.g. uname -a): Linux edge-router-mgmt-859668c9b7-94h9j 5.4.0-1074-azure Add support for IPV6 in dns resolvers #77~18.04.1-Ubuntu SMP Wed Mar 30 15:36:02 UTC 2022 x86_64 Linux
  • Install tools:
    • Please mention how/where was the cluster created like kubeadm/kops/minikube/kind etc.
  • Basic cluster related info:
    • kubectl version
    • kubectl get nodes -o wide
❯ kubectl get nodes -o wide
NAME             STATUS   ROLES   AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
aks-vmss000000   Ready    agent   6h29m   v1.21.7   10.240.0.4     <none>        Ubuntu 18.04.6 LTS   5.4.0-1074-azure   containerd://1.4.13+azure-2
aks-vmss000001   Ready    agent   6h29m   v1.21.7   10.240.0.255   <none>        Ubuntu 18.04.6 LTS   5.4.0-1074-azure   containerd://1.4.13+azure-2
  • How was the ingress-nginx-controller installed: manually via kubectl apply ...

  • Others:

    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
      • Any other related information that may help

What happened:
The configuration I'm trying to apply is not significant - the issue is that the presence of any issue is exceptionally challenging to troubleshoot.

After upgrading images from k8s.gcr.io/ingress-nginx/controller:v1.1.3 to k8s.gcr.io/ingress-nginx/controller-chroot:v1.2.0. I was unable to get ingress-nginx to accept the configuration and begin running.

Kuberenetes logs showed this output:

I0510 21:47:14.972579       7 controller.go:166] "Configuration changes detected, backend reload required"
I0510 21:47:14.981763       7 template.go:1021] empty byte size, hence it will not be set
I0510 21:47:14.982109       7 template.go:1021] empty byte size, hence it will not be set
I0510 21:47:14.982362       7 template.go:1021] empty byte size, hence it will not be set
E0510 21:47:15.053461       7 controller.go:178] Unexpected failure reloading the backend:

-------------------------------------------------------------------------------
Error: exit status 1
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg707585121:143
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg707585121:144
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg707585121:145
nginx: the configuration file /tmp/nginx/nginx-cfg707585121 syntax is ok
nginx: [emerg] open() "/dev/stdout" failed (13: Permission denied)
nginx: configuration file /tmp/nginx/nginx-cfg707585121 test failed

-------------------------------------------------------------------------------

What you expected to happen:

I expected the configuration to load successfully.

Suspected root cause:

Inside of the chroot(), the /dev file directory (mounted inside of the Docker image at /chroot/dev) does not include anything at /dev/stdout. (My OS [Centos 8] has a symlink from /dev/stdout -> /proc/self/fd/1.)

As a result, anything inside the chroot() environment that attempts to write to stdout is tasked with creating the new file under /dev (really, under /chroot/dev). Because processes are run by the www-data user (and, naturally, /chroot/dev is owned by root), those processes are unable to create a new file there and write to it.

I was able to resolve this by adding a line to the end of chroot.sh:

ln -s /var/log/nginx/access.log /chroot/dev/stdout

How to reproduce it:
I don't have minimalistic steps for how to reproduce this.

In general, it should suffice to find anything inside of the chroot() environment that attempts to write to /dev/stdout directly (could be from one of the many 3rd party packages loaded in), and it will fail with a "Permission Denied" error.

Anything else we need to know:
If you know where you want the stdout from the chroot() to be written to, this is a simple change. I'd recommend doing the analogous for /dev/stderr, as well.

@gcolten gcolten added the kind/bug Categorizes issue or PR as related to a bug. label May 10, 2022
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 10, 2022
@k8s-ci-robot
Copy link
Contributor

@gcolten: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@longwuyuan
Copy link
Contributor

/remove-kind bug

I am using v1.2.0 of the controller and I am not able to reproduce this problem. Any developer working on this will need some kind of a step-by-step guide to reproduce the problem.

Several Azure users deploy using this https://kubernetes.github.io/ingress-nginx/deploy/#azure . There is no other GitHub issue with commands and outputs or Slack chat messages that have similarities with this use case on Azure. So can you help out by providing more information. Its not even certain if /dev/stdout in a chromed environment is what all users of nginx-ingress controller are asking.

@k8s-ci-robot k8s-ci-robot added needs-kind Indicates a PR lacks a `kind/foo` label and requires one. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 10, 2022
@gcolten
Copy link
Author

gcolten commented May 12, 2022

Thanks for looking at this, @longwuyuan!

I now realize this may be an enhancement request, but it seems reasonable, generalizable, and simple.

Request: Add symlinks inside the chroot() environment to allow nginx to write to:

  • /dev/stdout
  • /dev/stderr

Reproducing the error

Here's a simple way to reproduce the error. While it is driven by custom configuration, I do not suspect I am alone in expecting to have access to /dev/stdout.

  1. Download the deployment yaml (I'm running on AKS with k8s 1.22): wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/1.22/deploy.yaml
  2. Apply this patch to the deploy yaml:
cat << 'EOF' | patch
--- a/deploy.yaml
+++ b/deploy.yaml
@@ -402,6 +402,7 @@ spec:
         - --validating-webhook=:8443
         - --validating-webhook-certificate=/usr/local/certificates/cert
         - --validating-webhook-key=/usr/local/certificates/key
+        - --configmap=$(POD_NAMESPACE)/http-snippet-configmap
         env:
         - name: POD_NAME
           valueFrom:
@@ -413,7 +414,7 @@ spec:
               fieldPath: metadata.namespace
         - name: LD_PRELOAD
           value: /usr/local/lib/libmimalloc.so
-        image: k8s.gcr.io/ingress-nginx/controller:v1.2.0@sha256:d8196e3bc1e72547c5dec66d6556c0ff92a23f6d0919b206be170bc90d5f9185
+        image: k8s.gcr.io/ingress-nginx/controller-chroot:v1.2.0
         imagePullPolicy: IfNotPresent
         lifecycle:
           preStop:
@@ -460,6 +461,7 @@ spec:
           capabilities:
             add:
             - NET_BIND_SERVICE
+            - SYS_CHROOT
             drop:
             - ALL
           runAsUser: 101
@@ -618,3 +620,18 @@ webhooks:
     resources:
     - ingresses
   sideEffects: None
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: http-snippet-configmap
+  namespace: ingress-nginx
+data:
+  server-tokens: "false"
+  # create a variable 'iserrorstatus' which is set to '0' if
+  # the status is 2xx or 3xx, otherwise '1'
+  http-snippet: |
+    map $status $iserrorstatus {
+      ~^[23]   0;
+      default  1;
+    }
EOF
  1. Deploy the ingress et al: kubectl apply -f deploy.yaml
  2. Attempt to deploy any service using an ingress containing this annotation:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello-world
  annotations:
    # disable access logging for successful requests
    nginx.ingress.kubernetes.io/configuration-snippet: |
      access_log /dev/stdout upstreaminfo  if=$iserrorstatus;

spec:
  ingressClassName: "nginx"
  rules:
  - host: example.com
    http:
      paths:
      - path: /hello
        pathType: ImplementationSpecific
        backend:
          service:
            name: hello-world
            port:
              name: http
  tls:
  - hosts:
    - example.com
    secretName: example-https-cert

You will fail to deploy that ingress because it tries to write to /dev/stdout.

We have many ingress definitions that attempt to write to /dev/stdout, which I believe is a generally reasonable thing to do. It's also very simple to resolve this error by adding a symlink from /dev/stdout (inside the chroot()) to a reasonable location.

Update (2022-05-12)

I now see that stdout actually propagates over the syslog channel syslog:server=127.0.0.1:11514.

It may be worth noting this as a possible breaking change with the upgrade. Any custom configuration that attempts to write to stdout must be updated to write over syslog to the configured address (specified via --internal-logger-address, denoted here)

@longwuyuan
Copy link
Contributor

@gcolten thank you for updates.

A lot of the specifics like /dev/stdout etc were informative from your earlier post as well. But this whole discussion is related to a chrooted process. So may I ask what is broken, with regards to what a ingress-controller should be doing, as per the specs of a ingress-controller described at Kubernetes.io. What do you want to achieve with a ingress object but are unable to achieve with the ingress-nginx controller.

@gcolten
Copy link
Author

gcolten commented May 12, 2022

Fair question, @longwuyuan.

At the end of the day, I think the behavior of the controller-chroot:v1.2.0 image is unintuitive and prone to causing breaking changes for various consumers (such as myself).

Users generally expect to have access to /dev/stdout for logging purposes. For example, the controller:v1.2.0 image even creates a symlink from /var/log/nginx/access.log to /dev/stdout, illustrative of how users generically assume to have access to such a file.

I don't know that there is a technical solution worth pursuing here.

That said, it may be valuable to call more attention to the scope of changes introduced with this -chroot image:

  • Mounts that must be accessible via the nginx must be mounted under /chroot
  • Generally, filesystem patterns have changed with the nginx executable, including the removal of some files users may expect. These "removals" include: /dev/std{out,err}, /proc/*, etc.

@longwuyuan
Copy link
Contributor

ok. Maybe there is much deeper dive needed by a developer. But just to confirm, if you run kubectl -n <ingresscontrollernamespace> logs <ingresscontrollerpodname> , are you able to get logs ? I install the controller with chroot enabled and I could see logs, so I don't understand what problem needs to be fixed.

If you are not pursuing a solution, then maybe the content here is good reference for anyone who searches for similar info in future. So consider closing the issue. Else you can wait and see if other comments come in.

Thanks.

@rikatz
Copy link
Contributor

rikatz commented May 14, 2022

The chroot image sends the log to an internal syslog, as the chroot process cannot write to std(err/output) (this is an Nginx limitation right now)

We can indeed link stdout and stderr to /var/log/nginx/access.log but this is going to be a no-op unless you point logs there, and if you do so you are probably not getting the logs in kubectl logs

Makes sense? :)

@byrneo
Copy link

byrneo commented May 26, 2022

FWIW, for reference, I've also hit a similar issue. I specify this on some ingresses (as I'm sure others might too):

nginx.ingress.kubernetes.io/modsecurity-snippet: |
      SecRuleEngine On
      SecDebugLog /dev/stdout

when i try to update to the chroot image, it fails due to:

nginx: [emerg] "modsecurity_rules" directive Rules error. File: <<reference missing or not informed>>. Line: 2. Column: 19. Failed to start DebugLog: Failed to open file: /dev/stdout  in /tmp/nginx/nginx-cfg1775600301:1718

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 24, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 23, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2022
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

6 participants