Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown cluster cleanly when SIGTERM or SIGINT is received #1705

Merged
merged 1 commit into from
Nov 7, 2017

Conversation

SeanTAllen
Copy link
Contributor

Previously, if a developer working with Wallaroo sent a SIGINT via
ctrl-c in the terminal, Wallaroo would exit but wouldn't gracefully
shutdown. This could lead to errors on the next application startup as
recovery files weren't dispose of properly.

With this commit, sending SIGTERM or SIGINT will shut down Wallaroo
gracefully in the same fashion that the external "shutdown cluster"
command work.

Once we have "shrink to fit" in place, the plan would be to have SIGTERM
and SIGINT remove that individual worker from the cluster rathter than
shut the entire cluster down. Once we have shrink to fit in place, we
might change our mind and go in a different direction and leave this
functionality as is.

Closes #1701

@SeanTAllen
Copy link
Contributor Author

To test, use the Python alphabet partitioned and alphabet examples.

Start up each, verify recovery files exist in /tmp

Do ctrl-c to shutdown

verify that files no longer exist.

In the case of alphabet partitioned, you should test by doing ctrl-c on different workers.

@@ -634,3 +639,30 @@ actor Startup
"the Wallaroo Community License at https://github.com/WallarooLabs/" +
"wallaroo/blob/master/LICENSE.md for details, and also please visit " +
"the page at http://www.wallaroolabs.com/pricing****\n").cstring())

fun ref _setup_shutdown_handler(c: Connections, f: Startup,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name f is unclear in this context

@@ -33,7 +33,7 @@ actor Main
Machida.start_python()

try
SignalHandler(ShutdownHandler, Sig.int())
SignalHandler(MachidaShutdownHandler, Sig.int())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Sig.term() be added here also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfritchie it wasn't needed previously to get Python to exit correctly, but SIGINT was. So I veer towards, no. But, let's get @aturley's thoughts as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @SeanTAllen says, this was a special case that we added to fix a specific problem. Unless there's a problem with the behavior there no reason to handle Sig.term().

@SeanTAllen SeanTAllen force-pushed the issue-1701 branch 4 times, most recently from 419bccd to 1ddda2f Compare November 7, 2017 17:35
Previously, if a developer working with Wallaroo sent a SIGINT via
ctrl-c in the terminal, Wallaroo would exit but wouldn't gracefully
shutdown. This could lead to errors on the next application startup as
recovery files weren't dispose of properly.

With this commit, sending SIGTERM or SIGINT will shut down Wallaroo
gracefully in the same fashion that the external "shutdown cluster"
command work.

Once we have "shrink to fit" in place, the plan would be to have SIGTERM
and SIGINT remove that individual worker from the cluster rathter than
shut the entire cluster down. Once we have shrink to fit in place, we
might change our mind and go in a different direction and leave this
functionality as is.

Closes #1701
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants