Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dead workers immediately after killing. #1084

Closed
wants to merge 8 commits into from

Conversation

rwe
Copy link
Contributor

@rwe rwe commented Jul 13, 2015

This also organizes the shared "worker cleanup" operations like closing worker.tmp and calling self.cfg.worker_exit.

This means that self.WORKERS will always be updated after spawning/killing, and may only be out of sync due to unknown dead processes (handled in reap_workers).

This was done based on conversation in #1078. The original suggestion was to maintain a dead worker list but I believe that this is a little simpler and accomplishes the same thing. and I eventually came to the same conclusion.

if not worker:
continue
worker.tmp.close()
self.reap_dead_worker(wpid)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this did not call self.cfg.worker_exit(self, worker) before, but will now. I believe that that was an oversight and this is a correct change, but could use a double-check if it was intentional.

@rwe rwe force-pushed the cleanup-async-reaping branch from dc3f522 to de31a19 Compare July 13, 2015 21:51
extra={"metric": "gunicorn.workers",
"value": len(workers),
"value": len(self.WORKERS),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, workers and self.WORKERS could be out of sync here if workers were killed, because they would be reaped asynchronously. Because both their insertion (via spawn_worker) and removal (via kill_worker) are now synchronous, self.WORKERS is the correct value here.

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 13, 2015

I think reap_dead_worker can just be inlined in cleanup_worker and everything can be done when reap_workers happens on SIGCHLD. I think it's confusing to have several places where cleanup is called.

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 13, 2015

Oh, I see. But if we do that, then we can't pop synchronously, because by the time we get the SIGCHLD we don't have the worker object.

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 13, 2015

I'll wait for others to comment and take a look again tomorrow.

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 13, 2015

Thanks for doing this.

@rwe
Copy link
Contributor Author

rwe commented Jul 13, 2015

No problem @tilgovi. If I read your comments correctly, you arrived at same conclusion as I did here in terms of cleaning up immediately versus needing to keep the worker object around in a separate non-WORKERS list. Is that right?

@rwe
Copy link
Contributor Author

rwe commented Jul 14, 2015

As I look more into this, I'm starting to think that the original suggestion of keeping a 'killed worker' list is correct. Otherwise killing and waiting for ECHILD must be done synchronously to avoid prematurely signaling the child's exit before it's handled the kill signal.

rwe added 3 commits July 13, 2015 20:08
This also organizes the shared "worker cleanup" operations like closing
`worker.tmp` and calling `self.cfg.worker_exit`.

This means that `self.WORKERS` will always be updated after
spawning/killing, and may only be out of sync due to unknown dead
processes (handled in `reap_workers`).
@rwe rwe force-pushed the cleanup-async-reaping branch from de31a19 to 15c43f0 Compare July 14, 2015 00:12
worker.tmp.close()
self.cfg.worker_exit(self, worker)
except:
pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed because this portion of spawn_worker is executed on the worker itself, and we let the parent handle its exit in reap_workers.

@rwe
Copy link
Contributor Author

rwe commented Jul 14, 2015

Ok, I believe this is correct now. It removes them from self.WORKERS immediately but still cleans them up properly when SIGCHILD comes in.

@rwe
Copy link
Contributor Author

rwe commented Jul 16, 2015

This is ready for review, if anyone's available to take a look. My main interest in this PR is to unblock #1078

continue
worker.tmp.close()

worker = (self.WORKERS.pop(wpid, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment here to say to explain that this function might happen before the SIGCHLD handler for this pid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that's why this is like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, reap_workers is only called in response to SIGCHILD as before.

self.DYING_WORKERS exists so that kill_worker can remove pids from self.WORKERS while we're waiting for SIGCHILD to come in and be processed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that SIGCHILD can be caused either by the child exiting unexpectedly, or because we killed it intentionally. In the first case, that worker would have still been in self.WORKERS. In the second, we would have moved it to self.DYING_WORKERS.

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 16, 2015

Two small comments but otherwise I think it looks fine.

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 16, 2015

I'd like @berkerpeksag or @benoitc to read this over, but I feel pretty good about it.

@rwe
Copy link
Contributor Author

rwe commented Jul 16, 2015

Awesome. I made one additional commit to make sure that dying workers are included in stop and murder_workers to preserve that original behavior. Pending further comments, that's all I have.

@@ -424,7 +425,7 @@ def murder_workers(self):
"""
if not self.timeout:
return
workers = list(self.WORKERS.items())
workers = list(self.WORKERS.items() + self.DYING_WORKERS.items())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this will raise TypeError in Python 3. I'd write this like

workers = list(self.WORKERS.items()) + list(self.DYING_WORKERS.items())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you!

@berkerpeksag
Copy link
Collaborator

Looks good to me (except my comment about Python 3). Could you also squash the commits into one? Thanks!

(Should we get this in for the next release or wait for 20.0?)

try:
worker = self.WORKERS.pop(pid)
worker.tmp.close()
self.cfg.worker_exit(self, worker)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hooks disappeared in the change. Should probably come back

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in reap_workers on line 469.

@benoitc
Copy link
Owner

benoitc commented Jul 17, 2015

I will take the time to read that patch over the week-end. To be honnest I am quite uncomfortable to maintain 2 lists there and I am not sure that all races conditions are solved right now. Like what happen if I try to kill a worker that died but we didn't get yet the signal?

We should probably start to add some tests there. (but that outside the scope of this PR).

note: I still don't understand why moving the debug log at the end of the if block like suggested wasn't enough to solve #1078.

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 17, 2015

I reviewed and we discussed the two places for races. It is the cause of the two places where we pop from both lists.

Is there a reason you are uncomfortable with two lists? In my opinion, it makes reasoning about the concurrency easier.

@benoitc
Copy link
Owner

benoitc commented Jul 22, 2015

@tilgovi I am worried about race condition when you remove a socket from SOCKETS and pour it in DYING_WORKERS . I may just be be too cautious there.

Anyway I like the idea of marking a socket as dead ASAP and remove it from the count. I wonder if we can then just keep a reference of them in the DYING_WORKERS and when it's time garbage collect hem and finally remove them from WORKERS in one place. So the change would be really atomic and not lock too much the VM. Thoughts?

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 22, 2015

I hear you. Signals are tricky places for concurrency problems.

I'll double check over this and see if there's anything I'd change.

On Tue, Jul 21, 2015, 21:50 Benoit Chesneau notifications@github.com
wrote:

@tilgovi https://github.com/tilgovi I am worried about race condition
when you remove a socket from SOCKETS and pour it in DYING_WORKERS . I
may just be be too cautious there.

Anyway I like the idea of marking a socket as dead ASAP and remove it from
the count. I wonder if we can then just keep a reference of them in the
DYING_WORKERS and when it's time garbage collect hem and finally remove
them from WORKERS in one place. So the change would be really atomic and
not lock too much the VM. Thoughts?


Reply to this email directly or view it on GitHub
#1084 (comment).

@rwe
Copy link
Contributor Author

rwe commented Jul 22, 2015

@benoitc @tilgovi Are you referring to WORKERS when you say SOCKETS? Not finding that var anywhere.

Sounds like you might be referring to this?

worker = self.WORKERS.pop(pid, None)
# <-- If `reap_workers` were run here, we would not clean up the worker
#     until it later timed out and got collected in `murder_workers()`.
#     That would be annoying (though probably not problematic).
#
#     However, that can't happen: the signal handling just adds to SIG_QUEUE,
#     which is processed in the main run loop. `reap_workers` will never
#     ever be run behind our backs.
if worker is not None:
    self.DYING_WORKERS.setdefault(pid, worker)

@rwe
Copy link
Contributor Author

rwe commented Jul 22, 2015

Nope, I'm wrong about that and not sure how I overlooked it: self.handle_child is registered directly for SIG_CHILD, unlike all other signals that go to self.signal which queues them for handling in the main loop.

So that is a potential problem. Signal handlers are supposed to do very little, so I'm not sure what the rationale is for calling reap_workers() there versus adding the signal to be handled in the main run loop.

My guess is that it was done that way to keep self.WORKERS matching the state of worker liveliness as much as possible, a behaviour which this PR obsoletes. In which case, the right move would be to remove that specialized handling and queue the signal like normal.

Does that make sense?

@rwe
Copy link
Contributor Author

rwe commented Jul 22, 2015

It seems that the original code calling reap_workers() directly in a signal handler was likely a mistake to begin with: Calling os.waitpid(), calling self.cfg.worker_exit(), or modifying self.WORKERS in that signal handler were all potentially unsafe. It could also raise HaltServer there, which wouldn't be caught and turned into self.halt in the main loop as intended.

@tilgovi
Copy link
Collaborator

tilgovi commented Jul 22, 2015

Yes let's queue the call and then only modify the lists outside the signal
frame and I think this will be wonderful.

On Wed, Jul 22, 2015, 09:20 Robert Estelle notifications@github.com wrote:

It seems that the original code calling reap_workers() directly in a
signal handler was likely a mistake to begin with: Calling os.waitpid(),
calling self.cfg.worker_exit(), or modifying self.WORKERS in that signal
handler were all potentially unsafe. It could also raise HaltServer
there, which it wouldn't be caught and turned into self.halt in the main
loop as intended.


Reply to this email directly or view it on GitHub
#1084 (comment).

@rwe rwe force-pushed the cleanup-async-reaping branch from 4e349e4 to 22662c9 Compare July 22, 2015 16:30
The new `DYING_WORKERS` list means that we don't need to run
`reap_workers` immediately on SIG_CHLD; and in fact, we shouldn't.

The original code calling `reap_workers()` directly in a signal handler
at any time was likely unsafe to begin with: `reap_workers` calls
`os.waitpid()`, calls `self.cfg.worker_exit()`, and modifies
`self.WORKERS`. All of those are potentially unsafe in a signal handler.
And not sure what raising It also can raise `HaltServer`, which wouldn't
be caught and turned into `self.halt` in the main loop as intended.
@rwe rwe force-pushed the cleanup-async-reaping branch from 22662c9 to 180fa0e Compare July 22, 2015 16:32
After going through git blame, there was no rationale recorded for the
very low limit: it was present in the first commit introducing the
signal queue. Since SIG_CHLD gets queued now, it would be entirely
possible for the previous limit of 5 queued signals to be exceeded
during `manage_workers`.
@rwe rwe force-pushed the cleanup-async-reaping branch from 110654b to 989a423 Compare July 22, 2015 16:42
@benoitc
Copy link
Owner

benoitc commented Jul 23, 2015

No please don't. The reason we are handling SIGCHLD differently is because of portability.

Also it is handled outside of the queue, because really you want it to handle it ASAP not somewhere in the signal queue. The signal queue is here mostly to reduce the signal noise. So you don't want it too large since we put the priority on the monitoring. I would like to keep that behaviour for now.

@erydo yes I was speaking about WORKERS.

There are really good points anyway in this PR that we should really keep. In kill_worker and spawn_worker we should really not removing it from the WORKERS there. But I think we could do better than juggling between WORKERS and DYING_WORKERS. I would instead just keep a reference in DYING_WORKER and not touch WORKERS and keep the logic of a lazy garbage collection on reap_worker and have another routine in the main loop that would do the garbage collection.

Thoughts?

@rwe
Copy link
Contributor Author

rwe commented Jul 23, 2015

@benoitc

  1. I'm having trouble understanding the portability element you're referring to in that link. I see that SIGCHLD can not be set to SIG_IGN, but that isn't done here. Is there another issue I'm not parsing out?
  2. The short signal queue means that signals can be dropped silently, even if they're important, and by default arbiter handles a bunch of signal types. For example, something sending signals to increment/decrement the worker count could easily drown out HUP/QUIT/INT etc signals.
  3. I believe there's a semantic distinction between "workers that we expect to be alive" and "workers that we expect to be dying" that was ignored before and that this preserves. Previously, manage_workers, spawn_workers, etc. would include dying workers when verifying whether there were enough active ones. They would also potentially send duplicate SIGTERMs to the same worker over subsequent calls of manage_workers, which would could be interleaved with SIGKILL from murder_workers.

@benoitc
Copy link
Owner

benoitc commented Jul 24, 2015

@erydo there are different concerns thre. And I would like to separate them.

  1. About handling reaping dead workers apart, this is a design choice. Dying workers should likely happen, and you don't want to run them each time in the main loop but you want to run them ASAP to keep the number of workers.
  2. Queue size: you're right there is a possibility to discard some signals there and we should fix it.
  3. Marking a worker as inactive or dying should indeed be done ASAP. But I don't think that juggling between 2 structures is OK (the or thing and moving from one to the other). Instead we should find a way to mark them in a more atomic way as dying, or simply maintain a counter since we don't really need to know which one is about to dying, the system is doing it for us anyway. Just knowing that the number of active processes has been changed should already improve the loop a lot.

About 1 and 2 I reread the code and looked at your changes and I think we could mix our concerns by implementing a priority queue. All signals will be queued there and we would assign a priority to them. Then this queue will be handled in the main loop. As for the waitpid I would like to keep it that way. This is a common pattern and we can discuss some changes about it later. Would that work for you?

About 3, I would also note that I am working on a patch implementing the IMSG framework from the OpenBSD project to improve the communication with the workers and removing the temporary file trick. I will have a working patch over the weekend and this is also why I want to keep the changes simple for now while I agree we are doing too much works to handle race conditions.

@benoitc
Copy link
Owner

benoitc commented Feb 1, 2017

closing the issue. I'm uncomfortable with that change as noted above. thanks for the patch anyway.

@benoitc benoitc closed this Feb 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants