'I/O operation on closed file' on SIGHUP #863

wmttom · 2014-08-25T06:04:06Z

2014-08-25 13:51:42 [16744] [INFO] Handling signal: hup
2014-08-25 13:51:42 [16744] [INFO] Hang up: Master
2014-08-25 13:51:42 [7683] [INFO] Booting worker with pid: 7683
2014-08-25 13:51:42 [7684] [INFO] Booting worker with pid: 7684
2014-08-25 13:51:42 [7685] [INFO] Booting worker with pid: 7685
2014-08-25 13:51:42 [7686] [INFO] Booting worker with pid: 7686
2014-08-25 13:51:42 [7687] [INFO] Booting worker with pid: 7687
2014-08-25 13:51:42 [7688] [INFO] Booting worker with pid: 7688
2014-08-25 13:51:42 [7689] [INFO] Booting worker with pid: 7689
2014-08-25 13:51:42 [38739] [INFO] Worker exiting (pid: 38739)
2014-08-25 13:51:42 [38743] [INFO] Worker exiting (pid: 38743)
2014-08-25 13:51:42 [38745] [INFO] Worker exiting (pid: 38745)
2014-08-25 13:51:42 [38744] [INFO] Worker exiting (pid: 38744)
2014-08-25 13:51:42 [38741] [INFO] Worker exiting (pid: 38741)
2014-08-25 13:51:42 [38742] [INFO] Worker exiting (pid: 38742)
2014-08-25 13:51:42 [38740] [INFO] Worker exiting (pid: 38740)
2014-08-25 13:51:42 [38746] [INFO] Worker exiting (pid: 38746)
2014-08-25 13:51:42 [7690] [INFO] Booting worker with pid: 7690
2014-08-25 13:51:43 [16744] [INFO] Unhandled exception in main loop:
Traceback (most recent call last):
  File "/home/httpapi/.virtualenvs/hizonenum/lib/python2.7/site-packages/gunicorn/arbiter.py", line 174, in run
    self.sleep()
  File "/home/httpapi/.virtualenvs/hizonenum/lib/python2.7/site-packages/gunicorn/arbiter.py", line 308, in sleep
    oldest = min(w.tmp.last_update() for w in worker_values)
  File "/home/httpapi/.virtualenvs/hizonenum/lib/python2.7/site-packages/gunicorn/arbiter.py", line 308, in <genexpr>
    oldest = min(w.tmp.last_update() for w in worker_values)
  File "/home/httpapi/.virtualenvs/hizonenum/lib/python2.7/site-packages/gunicorn/workers/workertmp.py", line 50, in last_update
    return os.fstat(self._tmp.fileno()).st_ctime
ValueError: I/O operation on closed file

2014-08-25 13:51:43 [7689] [INFO] Worker exiting (pid: 7689)
2014-08-25 13:51:43 [7683] [INFO] Worker exiting (pid: 7683)
2014-08-25 13:51:43 [7687] [INFO] Worker exiting (pid: 7687)
2014-08-25 13:51:43 [7690] [INFO] Worker exiting (pid: 7690)
2014-08-25 13:51:43 [7684] [INFO] Worker exiting (pid: 7684)
2014-08-25 13:51:43 [7688] [INFO] Worker exiting (pid: 7688)
2014-08-25 13:51:43 [7686] [INFO] Worker exiting (pid: 7686)
2014-08-25 13:51:43 [7685] [INFO] Worker exiting (pid: 7685)

The text was updated successfully, but these errors were encountered:

svdgraaf · 2014-08-30T06:16:56Z

Interestingly, I had the same issue, but at a random point in time:

[2014-08-30 03:13:25 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:15513)
[2014-08-30 03:13:26 +0000] [18391] [INFO] Booting worker with pid: 18391
[2014-08-30 03:13:27 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:16607)
[2014-08-30 03:13:28 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:17159)
[2014-08-30 03:13:28 +0000] [18428] [INFO] Booting worker with pid: 18428
[2014-08-30 03:13:28 +0000] [18429] [INFO] Booting worker with pid: 18429
[2014-08-30 03:13:30 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:16850)
[2014-08-30 03:13:31 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:16998)
[2014-08-30 03:13:31 +0000] [18538] [INFO] Booting worker with pid: 18538
[2014-08-30 03:13:31 +0000] [18539] [INFO] Booting worker with pid: 18539
[2014-08-30 03:13:34 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:17032)
[2014-08-30 03:13:35 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:17125)
[2014-08-30 03:13:35 +0000] [18609] [INFO] Booting worker with pid: 18609
[2014-08-30 03:13:35 +0000] [18610] [INFO] Booting worker with pid: 18610
[2014-08-30 03:13:36 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:16812)
[2014-08-30 03:13:37 +0000] [16812] [INFO] Worker exiting (pid: 16812)
Exception _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") in <bound method Cursor.__del__ of <MySQLdb.cursors.Cursor object at 0x38ae210>> ignored
[2014-08-30 03:13:37 +0000] [18682] [INFO] Booting worker with pid: 18682
127.0.0.1 - - [30/Aug/2014:03:13:37 +0200] "POST [redacted] HTTP/1.0" 200 - "[redacted]" "Mozilla/5.0 (Linux; Android 4.3; SAMSUNG-SGH-I747 Build/JSS15J) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.141 Mobile Safari/537.36"
127.0.0.1 - - [30/Aug/2014:03:13:38 +0200] "GET [redacted] HTTP/1.0" 200 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0"
[2014-08-30 03:15:00 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:18539)
[2014-08-30 03:15:01 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:18391)
[2014-08-30 03:15:01 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:18538)
[2014-08-30 03:15:01 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:18609)
[2014-08-30 03:15:01 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:18610)
[2014-08-30 03:15:01 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:18428)
[2014-08-30 03:15:01 +0000] [9531] [CRITICAL] WORKER TIMEOUT (pid:18429)
[2014-08-30 03:15:01 +0000] [18919] [INFO] Booting worker with pid: 18919
[2014-08-30 03:15:01 +0000] [9531] [INFO] Unhandled exception in main loop:
Traceback (most recent call last):
  File "/app/shared/venv/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 174, in run
    self.sleep()
  File "/app/shared/venv/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 308, in sleep
    oldest = min(w.tmp.last_update() for w in worker_values)
  File "/app/shared/venv/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 308, in <genexpr>
    oldest = min(w.tmp.last_update() for w in worker_values)
  File "/app/shared/venv/local/lib/python2.7/site-packages/gunicorn/workers/workertmp.py", line 50, in last_update
    return os.fstat(self._tmp.fileno()).st_ctime
ValueError: I/O operation on closed file

[2014-08-30 03:15:01 +0000] [18919] [INFO] Worker exiting (pid: 18919)
[2014-08-30 03:15:01 +0000] [18682] [INFO] Worker exiting (pid: 18682)

gunicorn==19.1.1

When the worker exited the tempfile is not available anymore so we can't get the last update and calculate the dynamic timeout introduced in d76bab4 . This changes fix it by catching the IO error. fix #863

benoitc · 2014-08-30T07:36:03Z

I committed a fix in the branch #865 . Please review.

When the worker exited the tempfile is not available anymore so we can't get the last update and calculate the dynamic timeout introduced in d76bab4 . This changes fix it by catching the IO error. fix #863

sholsapp · 2014-11-10T22:39:09Z

Hey @benoitc et al., any idea when will we get a release that has this fix in it?

benoitc · 2014-11-10T22:43:10Z

I will make a minor release tomorrow :)

sholsapp · 2014-11-11T00:37:21Z

Thanks @benoitc that would awesome. 👍

sholsapp · 2014-11-11T21:39:10Z

Hey @benoitc I don't see it on PyPI quite yet. Just a friendly reminder.

nbaggott · 2014-11-12T19:03:51Z

@benoitc, looking forward to the new release. This bug is very severe for high volume apps that aggressively recycle workers. In the interm, I'm working around by applying a monkey patch - but that doesn't feel good.

benoitc · 2014-11-12T21:38:31Z

Sorry i was too much busy to prepare the release, but I will tomorrow.

nbaggott · 2014-11-18T19:15:10Z

@benoitc, I've been testing the patch (applied via monkey) and it doesn't seem to completely fix the problem. My app still managed to crash with a stack trace like:

2014-11-17 14:38:54 [4910] [INFO] Unhandled exception in main loop:
Traceback (most recent call last):
  File "/export/content/lid/apps/ekgapp/9ef005b54139d8dba20bd80e24b34b25fc7709f8/lib/python2.6/site-packages/gunicorn/arbiter.py", line 174, in run
    self.sleep()
  File "/export/content/lid/apps/ekgapp/9ef005b54139d8dba20bd80e24b34b25fc7709f8/lib/python2.6/site-packages/ekgapp/gunicorn_conf.py", line 42, in sleep
    last_update = w.tmp.last_update()
  File "/export/content/lid/apps/ekgapp/9ef005b54139d8dba20bd80e24b34b25fc7709f8/lib/python2.6/site-packages/gunicorn/workers/workertmp.py", line 50, in last_update
    return os.fstat(self._tmp.fileno()).st_ctime
OSError: [Errno 9] Bad file descriptor

I think the except block in the sleep() needs to ignore OSError in addition to ValueError.

I spent some time trying to figure out how this could happen and my best guess is that there is still a race condition caused by the signal handler for SIGCHLD, which calls worker.tmp.close() in reap_workers(). If the signal handler is called before/during sleep() then WorkerTmp.last_update() will throw "OSError: [Errno 9] Bad file descriptor" when it calls os.fstat() on a file descriptor that doesn't exist. But this seems like it'd be the same race condition that caused the ValueError reported in this bug. In the OS closing a file descriptor is atomic but in python maybe there's two steps and you get a different exception depending on your exact timing.

benoitc · 2014-11-18T21:13:29Z

@nbaggott are you under Python 3?

nbaggott · 2014-11-18T22:11:35Z

@benoitc python 2.6

GuoJing · 2014-12-11T00:17:26Z

I need this fix, when is the next release?

Thank you.

sholsapp · 2014-12-16T18:23:15Z

Ping.

benoitc · 2014-12-16T20:23:06Z

release is planned tomorrow, sorry for the delay.

GuoJing · 2014-12-17T03:00:29Z

Thank you @benoitc .

NadyaIonova · 2015-01-14T13:50:26Z

@benoitc Hi! I'm waiting for this release too. Are there any plans to release soon? Thank you!

tilgovi · 2015-01-14T20:59:00Z

Sorry, I may have been holding this up with review of an issue with the
threaded worker. @benoitc I'll try to do this tomorrow.

On Wed, Jan 14, 2015, 14:50 Nadya notifications@github.com wrote:

@benoitc https://github.com/benoitc Hi! I'm waiting for this release
too. Are there any plans to release soon? Thank you!

—
Reply to this email directly or view it on GitHub
#863 (comment).

techiaith · 2015-01-20T10:06:08Z

I can confirm this on python3.4/gunicorn 19.1.1
Not very nice for production servers!

[2015-01-20 10:02:57 +0000] [1124] [INFO] Unhandled exception in main loop:
Traceback (most recent call last):
  File "/home/techiaith/.virtualenvs/api_centre/lib/python3.4/site-packages/gunicorn/arbiter.py", line 174, in run
    self.sleep()
  File "/home/techiaith/.virtualenvs/api_centre/lib/python3.4/site-packages/gunicorn/arbiter.py", line 308, in sleep
    oldest = min(w.tmp.last_update() for w in worker_values)
  File "/home/techiaith/.virtualenvs/api_centre/lib/python3.4/site-packages/gunicorn/arbiter.py", line 308, in <genexpr>
    oldest = min(w.tmp.last_update() for w in worker_values)
  File "/home/techiaith/.virtualenvs/api_centre/lib/python3.4/site-packages/gunicorn/workers/workertmp.py", line 50, in last_update
    return os.fstat(self._tmp.fileno()).st_ctime
ValueError: I/O operation on closed file

If there's anything I can do to help @tilgovi, let me know.

minusf · 2015-01-27T12:06:18Z

if you can find the time, please roll a minor release for this issue.

benoitc · 2015-01-27T12:07:42Z

@minusf working on it. should be done tomorrow .

olivetree123 · 2015-06-23T02:27:51Z

I met this problem too , why does it happen ? ( python 2.7.5 , gunicorn 19.1.1 )

benoitc · 2015-06-23T02:42:24Z

@olivetree123 the issue has been fixed in latest stable release. You should upgrade to the latest one.

When the worker exited the tempfile is not available anymore so we can't get the last update and calculate the dynamic timeout introduced in d76bab4 . This changes fix it by catching the IO error. fix benoitc#863

The fix for issue benoitc#863 was incomplete as both OSError or ValueError could be raised by os.fstat(). Thus we need to catch both types of exceptions

When the worker exited the tempfile is not available anymore so we can't get the last update and calculate the dynamic timeout introduced in d76bab4 . This changes fix it by catching the IO error. fix benoitc#863

The fix for issue benoitc#863 was incomplete as both OSError or ValueError could be raised by os.fstat(). Thus we need to catch both types of exceptions

benoitc mentioned this issue Aug 30, 2014

fix race condition error #865

Merged

tilgovi closed this as completed in #865 Sep 9, 2014

jaingaurav mentioned this issue Jun 21, 2015

Catch OSError as well as ValueError on race condition #1052

Merged

paulnivin mentioned this issue Dec 16, 2015

Pull in fix for race condition on HUP lyft/gunicorn#2

Closed

captbaritone mentioned this issue Aug 17, 2016

gunicorn crashed vim-awesome/vim-awesome#250

Closed

xiangpingjiang mentioned this issue Nov 19, 2024

ValueError:I/O operation on closed file #3324

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'I/O operation on closed file' on SIGHUP #863

'I/O operation on closed file' on SIGHUP #863

wmttom commented Aug 25, 2014

svdgraaf commented Aug 30, 2014

benoitc commented Aug 30, 2014

sholsapp commented Nov 10, 2014

benoitc commented Nov 10, 2014

sholsapp commented Nov 11, 2014

sholsapp commented Nov 11, 2014

nbaggott commented Nov 12, 2014

benoitc commented Nov 12, 2014

nbaggott commented Nov 18, 2014

benoitc commented Nov 18, 2014

nbaggott commented Nov 18, 2014

GuoJing commented Dec 11, 2014

sholsapp commented Dec 16, 2014

benoitc commented Dec 16, 2014

GuoJing commented Dec 17, 2014

NadyaIonova commented Jan 14, 2015

tilgovi commented Jan 14, 2015

techiaith commented Jan 20, 2015

minusf commented Jan 27, 2015

benoitc commented Jan 27, 2015

olivetree123 commented Jun 23, 2015

benoitc commented Jun 23, 2015

'I/O operation on closed file' on SIGHUP #863

'I/O operation on closed file' on SIGHUP #863

Comments

wmttom commented Aug 25, 2014

svdgraaf commented Aug 30, 2014

benoitc commented Aug 30, 2014

sholsapp commented Nov 10, 2014

benoitc commented Nov 10, 2014

sholsapp commented Nov 11, 2014

sholsapp commented Nov 11, 2014

nbaggott commented Nov 12, 2014

benoitc commented Nov 12, 2014

nbaggott commented Nov 18, 2014

benoitc commented Nov 18, 2014

nbaggott commented Nov 18, 2014

GuoJing commented Dec 11, 2014

sholsapp commented Dec 16, 2014

benoitc commented Dec 16, 2014

GuoJing commented Dec 17, 2014

NadyaIonova commented Jan 14, 2015

tilgovi commented Jan 14, 2015

techiaith commented Jan 20, 2015

minusf commented Jan 27, 2015

benoitc commented Jan 27, 2015

olivetree123 commented Jun 23, 2015

benoitc commented Jun 23, 2015