Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nginx master process killed thus no futher reloads #1439

Closed
cu12 opened this issue Sep 28, 2017 · 2 comments · Fixed by #1440
Closed

Nginx master process killed thus no futher reloads #1439

cu12 opened this issue Sep 28, 2017 · 2 comments · Fixed by #1440

Comments

@cu12
Copy link

cu12 commented Sep 28, 2017

Today we hit an issue where kernel killed NGINX master process due to OOM using quay.io/aledbf/nginx-ingress-controller:0.232

NGINX master process died (-1): signal: killed

while this should make /healthz to fail, but as the controller does not monitor the master process, the connections are still served but no further reloads are possible.

Validating this:

root@nginx-ingress-controller-3307942697-8z7hb:/# ps fax
  PID TTY      STAT   TIME COMMAND
30733 ?        Ss     0:00 bash
30930 ?        R+     0:00  \_ ps fax
    1 ?        Ss     0:00 /usr/bin/dumb-init /nginx-ingress-controller --default-backend-service=default/default-http-backend --default-ssl-certificate=default/easywp-ssl-certificates --configmap=default/nginx-ingress-controller --sort-b
    7 ?        Ssl    1:49 /nginx-ingress-controller --default-backend-service=default/default-http-backend --default-ssl-certificate=default/easywp-ssl-certificates --configmap=default/nginx-ingress-controller --sort-backends=true --upda
   21 ?        S      0:03  \_ nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
23275 ?        Sl     0:00      \_ nginx: worker process is shutting down
23276 ?        Sl     0:00      \_ nginx: worker process is shutting down
30529 ?        Sl     0:00      \_ nginx: worker process
30530 ?        Sl     0:00      \_ nginx: worker process
30531 ?        Sl     0:00      \_ nginx: worker process
30532 ?        Sl     0:00      \_ nginx: worker process
30533 ?        Sl     0:00      \_ nginx: worker process
30534 ?        Sl     0:00      \_ nginx: worker process

root@nginx-ingress-controller-3307942697-8z7hb:/# kill -9 21

root@nginx-ingress-controller-3307942697-8z7hb:/# ps fax
  PID TTY      STAT   TIME COMMAND
30733 ?        Ss     0:00 bash
30931 ?        R+     0:00  \_ ps fax
    1 ?        Ss     0:00 /usr/bin/dumb-init /nginx-ingress-controller --default-backend-service=default/default-http-backend --default-ssl-certificate=default/easywp-ssl-certificates --configmap=default/nginx-ingress-controller --sort-b
    7 ?        Ssl    1:49 /nginx-ingress-controller --default-backend-service=default/default-http-backend --default-ssl-certificate=default/easywp-ssl-certificates --configmap=default/nginx-ingress-controller --sort-backends=true --upda
23275 ?        Sl     0:00 nginx: worker process is shutting down
23276 ?        Sl     0:00 nginx: worker process is shutting down
30529 ?        Sl     0:00 nginx: worker process
30530 ?        Sl     0:00 nginx: worker process
30531 ?        Sl     0:00 nginx: worker process
30532 ?        Sl     0:00 nginx: worker process
30533 ?        Sl     0:00 nginx: worker process
30534 ?        Sl     0:00 nginx: worker process

root@nginx-ingress-controller-3307942697-8z7hb:/# nginx -s reload
2017/09/28 13:33:33 [notice] 30938#30938: signal process started
2017/09/28 13:33:33 [alert] 30938#30938: kill(21, 1) failed (3: No such process)
nginx: [alert] kill(21, 1) failed (3: No such process)

root@nginx-ingress-controller-3307942697-8z7hb:/# curl -I http://localhost:10254/healthz
HTTP/1.1 200 OK
Date: Thu, 28 Sep 2017 13:39:45 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8

The options I see at the moment:

  • /healthz should take the state of the master process into consideration
  • nginx-ingress-controller should monitor the state of the master process and restart if needed, this might be way out of the scope of course...
  • nginx itself could be run with supervisord, so master process is restarted upon failure
@aledbf
Copy link
Member

aledbf commented Sep 28, 2017

/healthz should take the state the master process into consideration

done

nginx-ingress-controller should monitor the state of the master process and restart if needed, this might be way out of the scope of course...

done

nginx itself could be run with supervisord, so master process is restarted upon failure

This is not a solution because the worker processes are not killed and this avoid the start of a new master process (bind to ports already being used)

Please check #1440

@cu12
Copy link
Author

cu12 commented Oct 3, 2017

@aledbf After couple of days testing in the wild, I can confirm that this is working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants