Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

the worker process died mysteriously #19

Open
hancush opened this issue Dec 1, 2018 · 1 comment
Open

the worker process died mysteriously #19

hancush opened this issue Dec 1, 2018 · 1 comment
Assignees

Comments

@hancush
Copy link
Member

hancush commented Dec 1, 2018

metro reported packets not being generated, so i shelled into the metro pdf merger server, and had a look at the logs. i found this:

ubuntu@ip-10-0-0-208:~$ tail -n 200 /tmp/metro-pdf-worker-d-XSTH3EHWO-err.log
...
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/datamade/metro-pdf-merger-d-XSTH3EHWO/tasks.py", line 146, in run
    self.doWork()
  File "/home/datamade/metro-pdf-merger-d-XSTH3EHWO/tasks.py", line 149, in doWork
    msg = redis.blpop(REDIS_QUEUE_KEY)
  File "/home/datamade/.virtualenvs/metro-pdf-merger-d-XSTH3EHWO/lib/python3.5/site-packages/redis/client.py", line 1163, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/home/datamade/.virtualenvs/metro-pdf-merger-d-XSTH3EHWO/lib/python3.5/site-packages/redis/client.py", line 578, in execute_command
    connection.send_command(*args)
  File "/home/datamade/.virtualenvs/metro-pdf-merger-d-XSTH3EHWO/lib/python3.5/site-packages/redis/connection.py", line 563, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/datamade/.virtualenvs/metro-pdf-merger-d-XSTH3EHWO/lib/python3.5/site-packages/redis/connection.py", line 538, in send_packed_command
    self.connect()
  File "/home/datamade/.virtualenvs/metro-pdf-merger-d-XSTH3EHWO/lib/python3.5/site-packages/redis/connection.py", line 442, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.

sure enough, the worker process had exited at some point the previous day.

ubuntu@ip-10-0-0-208:~$ sudo supervisorctl status
...
metro-pdf-merger-d-XSTH3EHWO:metro-pdf-merger   RUNNING   pid 1974, uptime 3 days, 6:34:30
metro-pdf-worker-d-XSTH3EHWO:metro-pdf-worker   EXITED    Nov 29 06:48 AM
...

there were no alerts in sentry or semaphor that our app had gone down. let's investigate why this happened, and consider putting alerts in place so we can handle this proactively if it happens again.

@reginafcompton
Copy link
Contributor

reginafcompton commented Dec 3, 2018

@evz and I looked in to why the worker died in the first place. We did not find a logical explanation, though we did find a pretty unlikely one: Redis exited on November 29, and restarted itself. Could this brief milisecond, when redis was down, have collided with the same brief milisecond when the Metro PDF worker executed msg = redis.blpop(REDIS_QUEUE_KEY)? Possibly, but that seems improbable.

[1947 | signal handler] (1543474090) Received SIGTERM, scheduling shutdown...
[1947] 29 Nov 06:48:10.278 # User requested shutdown...
[1947] 29 Nov 06:48:10.278 * Saving the final RDB snapshot before exiting.
[1947] 29 Nov 06:48:10.281 * DB saved on disk
[1947] 29 Nov 06:48:10.282 * Removing the pid file.
[1947] 29 Nov 06:48:10.282 # Redis is now ready to exit, bye bye...
[4352] 29 Nov 06:48:11.874 # Unable to set the max number of files limit to 10032 (Operation not permitted), setting the max clients configuration to 3984
.
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 2.8.4 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 4352
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

[4352] 29 Nov 06:48:11.875 # Server started, Redis version 2.8.4

Moving forward, let's be better about logging (as the issue suggests!), and if this occurs again, we might have a better sense of the reason why.

It seems like a reasonable place to add additional logging is in the run function of the ChildProcessor (i.e., the processor that "does the work"processing messages from Redis).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants