Nginx master process killed thus no futher reloads #1439

cu12 · 2017-09-28T13:46:32Z

Today we hit an issue where kernel killed NGINX master process due to OOM using quay.io/aledbf/nginx-ingress-controller:0.232

NGINX master process died (-1): signal: killed

while this should make /healthz to fail, but as the controller does not monitor the master process, the connections are still served but no further reloads are possible.

Validating this:

root@nginx-ingress-controller-3307942697-8z7hb:/# ps fax
  PID TTY      STAT   TIME COMMAND
30733 ?        Ss     0:00 bash
30930 ?        R+     0:00  \_ ps fax
    1 ?        Ss     0:00 /usr/bin/dumb-init /nginx-ingress-controller --default-backend-service=default/default-http-backend --default-ssl-certificate=default/easywp-ssl-certificates --configmap=default/nginx-ingress-controller --sort-b
    7 ?        Ssl    1:49 /nginx-ingress-controller --default-backend-service=default/default-http-backend --default-ssl-certificate=default/easywp-ssl-certificates --configmap=default/nginx-ingress-controller --sort-backends=true --upda
   21 ?        S      0:03  \_ nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
23275 ?        Sl     0:00      \_ nginx: worker process is shutting down
23276 ?        Sl     0:00      \_ nginx: worker process is shutting down
30529 ?        Sl     0:00      \_ nginx: worker process
30530 ?        Sl     0:00      \_ nginx: worker process
30531 ?        Sl     0:00      \_ nginx: worker process
30532 ?        Sl     0:00      \_ nginx: worker process
30533 ?        Sl     0:00      \_ nginx: worker process
30534 ?        Sl     0:00      \_ nginx: worker process

root@nginx-ingress-controller-3307942697-8z7hb:/# kill -9 21

root@nginx-ingress-controller-3307942697-8z7hb:/# ps fax
  PID TTY      STAT   TIME COMMAND
30733 ?        Ss     0:00 bash
30931 ?        R+     0:00  \_ ps fax
    1 ?        Ss     0:00 /usr/bin/dumb-init /nginx-ingress-controller --default-backend-service=default/default-http-backend --default-ssl-certificate=default/easywp-ssl-certificates --configmap=default/nginx-ingress-controller --sort-b
    7 ?        Ssl    1:49 /nginx-ingress-controller --default-backend-service=default/default-http-backend --default-ssl-certificate=default/easywp-ssl-certificates --configmap=default/nginx-ingress-controller --sort-backends=true --upda
23275 ?        Sl     0:00 nginx: worker process is shutting down
23276 ?        Sl     0:00 nginx: worker process is shutting down
30529 ?        Sl     0:00 nginx: worker process
30530 ?        Sl     0:00 nginx: worker process
30531 ?        Sl     0:00 nginx: worker process
30532 ?        Sl     0:00 nginx: worker process
30533 ?        Sl     0:00 nginx: worker process
30534 ?        Sl     0:00 nginx: worker process

root@nginx-ingress-controller-3307942697-8z7hb:/# nginx -s reload
2017/09/28 13:33:33 [notice] 30938#30938: signal process started
2017/09/28 13:33:33 [alert] 30938#30938: kill(21, 1) failed (3: No such process)
nginx: [alert] kill(21, 1) failed (3: No such process)

root@nginx-ingress-controller-3307942697-8z7hb:/# curl -I http://localhost:10254/healthz
HTTP/1.1 200 OK
Date: Thu, 28 Sep 2017 13:39:45 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8

The options I see at the moment:

/healthz should take the state of the master process into consideration
nginx-ingress-controller should monitor the state of the master process and restart if needed, this might be way out of the scope of course...
nginx itself could be run with supervisord, so master process is restarted upon failure

The text was updated successfully, but these errors were encountered:

aledbf · 2017-09-28T14:55:28Z

/healthz should take the state the master process into consideration

done

nginx-ingress-controller should monitor the state of the master process and restart if needed, this might be way out of the scope of course...

done

nginx itself could be run with supervisord, so master process is restarted upon failure

This is not a solution because the worker processes are not killed and this avoid the start of a new master process (bind to ports already being used)

Please check #1440

cu12 · 2017-10-03T07:21:30Z

@aledbf After couple of days testing in the wild, I can confirm that this is working as expected.

aledbf mentioned this issue Sep 28, 2017

Kill worker processes to allow the restart of nginx #1440

Merged

aledbf closed this as completed in #1440 Sep 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nginx master process killed thus no futher reloads #1439

Nginx master process killed thus no futher reloads #1439

cu12 commented Sep 28, 2017 •

edited

Loading

aledbf commented Sep 28, 2017

cu12 commented Oct 3, 2017 •

edited

Loading

Nginx master process killed thus no futher reloads #1439

Nginx master process killed thus no futher reloads #1439

Comments

cu12 commented Sep 28, 2017 • edited Loading

aledbf commented Sep 28, 2017

cu12 commented Oct 3, 2017 • edited Loading

cu12 commented Sep 28, 2017 •

edited

Loading

cu12 commented Oct 3, 2017 •

edited

Loading