-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bin/environmentd: auto-restart on halt #27852
Conversation
misc/python/materialize/cli/run.py
Outdated
# Kill the child PID, which is the group into which any descendent processes | ||
# spawned by the child process will exist in. | ||
try: | ||
os.killpg(child_pid, signal.SIGTERM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is what I had as well, but doesn't that make it so we leak child processes once we go through a restart. The restarted processes will have a different child_pid, and it's child processes will be in a different group. So we don't kill them here.
I didn't test your PR locally, but that's what I got with a very similar (if not the same) approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True! But those child processes should be clusterd
processes that are managed by the process orchestrator, and thus readopted by the next environmentd
incarnation. Though of course they'd be abandoned and leaked when the last environmentd
incarnation exited. I was thinking we could live with that glitch but ...
...after thinking about this for a while I felt like the process group and session management here was overly complex. I've pushed up a new revision that vastly simplifies the management. I think it is nearly 100% robust and avoids the glitch you called out here. The only ways it can fail are if 1) if run.py itself crashes (unavoidable) or 2) if environmentd
intentionally moves its subprocesses out into their own process group, which would be actively adversarial and something we don't need to worry about.
20ee337
to
ae34954
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excellent! I verified this locally with my 0dt demo/testing setup: works as expected! 👌
Bring the behavior of bin/environmentd more in line with Kubernetes and auto-restart envd after it halts. This facilitates demos and local testing of the new 0dt deployment flow, which involves a halt-and-restart when the new generation takes over. Kubernetes will also restart envd after a panic, but we choose not to do so here, to avoid papering over panics during local development. This commit involves a substantial refactoring and simplification of the process group and session management in the run.py script. Moving the subprocess into its own process group and session appears to be wholly unnecessary; instead, we can simply create a new process group for the parent, thus ensuring that all descendent processes are created in the the process group for the run.py script, then send a SIGTERM to all processes in that group when run.py exits.
ae34954
to
035b7dd
Compare
Amazing! |
Bring the behavior of bin/environmentd more in line with Kubernetes and auto-restart envd after it halts. This facilitates demos and local testing of the new 0dt deployment flow, which involves a halt-and-restart when the new generation takes over.
Kubernetes will also restart envd after a panic, but we choose not to do so here, to avoid papering over panics during local development.
Motivation
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.