Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ProcessExitGroup to ensure children are killed on exit #5198

Closed
wants to merge 5 commits into from

Conversation

fasmat
Copy link
Member

@fasmat fasmat commented Oct 25, 2023

Motivation

If the node exits unexpectedly or is killed with SIGKILL instead of SIGTERM it will not kill post services that it spawned.

This PR fixes this by adding a job object that kills processes assigned to it on close. For information on how this works see the official Microsoft documentation on job objects: https://learn.microsoft.com/en-us/windows/win32/api/jobapi2/nf-jobapi2-assignprocesstojobobject and https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-jobobject_basic_limit_information

Changes

  • On windows ProcessExitGroup adds the new process to a job object (itself) and ensures that they are closed when itself is closed
  • On other systems the ProcessExitGroup only ensures that new processes are in the same process group as the main process.

For windows this ensures that if the main process crashes / is killed all commands started through the ProcessExitGroup will be killed as well. On other systems this is only true for crashes. Killing the node with kill -9 will not cause the child processes to exit automatically. For this they have to listen to signals about their parent and close themselves.

Test Plan

  • n/a

TODO

  • Explain motivation or link existing issue(s)
  • Test changes and document test plan
  • Update documentation as needed
  • Update changelog as needed

@fasmat fasmat requested review from dshulyak and poszu as code owners October 25, 2023 13:48
@fasmat fasmat force-pushed the fix-service-not-killed branch from b2c9d07 to 1b0ed7b Compare October 25, 2023 13:53
Copy link
Contributor

@poszu poszu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a unit test that would:

  1. spawn process A that spawns process B
  2. kill process A
  3. observe that process B died

Comment on lines 15 to 34
type ProcessExitGroup struct{}

// NewProcessExitGroup returns a new ProcessExitGroup.
func NewProcessExitGroup() (ProcessExitGroup, error) {
return ProcessExitGroup{}, nil
}

// Dispose closes the ProcessExitGroup.
func (g ProcessExitGroup) Dispose() error {
return nil
}

// StartCommand starts the given command and adds it to the ProcessExitGroup.
func (g ProcessExitGroup) StartCommand(cmd *exec.Cmd) error {
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true, Pgid: os.Getpid()}
if err := cmd.Start(); err != nil {
return err
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making it all private?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym? These methods need to be called from outside the object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's used in the same package, isn't it?

@fasmat
Copy link
Member Author

fasmat commented Oct 25, 2023

Would it make sense to add a unit test that would:

  1. spawn process A that spawns process B
  2. kill process A
  3. observe that process B died

wouldn't work on linux (because the code actually doesn't work on non-windows systems)

@poszu
Copy link
Contributor

poszu commented Oct 25, 2023

Would it make sense to add a unit test that would:

  1. spawn process A that spawns process B
  2. kill process A
  3. observe that process B died

wouldn't work on linux (because the code actually doesn't work on non-windows systems)

I think it's fine to test windows-specific code only on windows :) Just skip/don't compile the test on a different OS.

I realize that the test would be much longer than the implementation, but on the other hand, the code for it is cryptic and prone to regressions :(

@poszu
Copy link
Contributor

poszu commented Oct 26, 2023

This shouldn't be required with spacemeshos/post-rs#141.

@fasmat
Copy link
Member Author

fasmat commented Oct 26, 2023

Superseded by #5201

@fasmat fasmat closed this Oct 26, 2023
@fasmat fasmat deleted the fix-service-not-killed branch October 26, 2023 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants