Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGTERM on multiple workers fails #671

Closed
wants to merge 19 commits into from

Conversation

euri10
Copy link
Member

@euri10 euri10 commented May 9, 2020

Fix for #668

it essentially reverts #620 and applies what the python documentation recommends ie terminate or close a process before join

To summarize, before #620 processes in docker containers were not killed gracefully, so it fixed that, but it introduced a bug described in #668 where for non-docker containers the shutdown was too abrupt and children lifespan events were skipped.

I think this PR deals nicely with both issues, it's rather boring to test things inside and out container, so I hope this is ok

@euri10 euri10 requested review from rafalp and florimondmanca May 9, 2020 05:07
@@ -71,6 +64,7 @@ def restart(self):
self.process.start()

def shutdown(self):
self.process.terminate()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean my worker can shutdown in middle of transaction happening in, say, Starlette's backgorund task?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure on this one,
one thing also it that I miread the doc, it says to terminate before join in case of a pool, we are not in this case....(https://docs.python.org/3/library/multiprocessing.html?highlight=join#multiprocessing.pool.Pool.join)

kind of stuck, dunno how to solve this, this is annoying to get the 20s timeout reached in a container

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafalp I think it wont shutdown in middle of transaction.
Uvicorn handles sigterm, and sends shutdown event to Starlette.
If it does, it's a Starlette issue (maybe on shutdown Starlette should wait for all background tasks..)

@euri10
Copy link
Member Author

euri10 commented May 10, 2020

ok after putting loggers basically on every line it appear to me that the mutiprocess can hang indefinitely not only on docker but also normally.

docker sends a SIGTERM signal and not a SIGINT

now if your NOT in a container and kill -15 parentPID then the supervisor will correctly capture the signal, enter def signal_handler(self, sig, frame): that sets the event, then enter the shutdown method.

def shutdown(self):
        for process in self.processes:
            process.join()

now here is the deal, if the signal is a SIGTERM, it will hang indefinitely on process.join()

so I think in that case only we need to terminate() the process, maybe there would be a nicer way that handles that case

In case of one worker a SIGTERM is nicely dealt with the

def handle_exit(self, sig, frame):

@tomchristie
Copy link
Member

Similar comment wrt. the Pull Request name on this as in #636 😄

@euri10 euri10 changed the title 668 workers shutdown SIGTERM on multiple workers fails May 19, 2020
@euri10 euri10 closed this Oct 6, 2020
@euri10 euri10 deleted the 668_workers_shutdown branch October 6, 2020 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants