Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Prevent calling the main process exit without WithOnClose() #108

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

attlee-wang
Copy link

@attlee-wang attlee-wang commented Sep 10, 2024

My service is a multi-goroutine daemon service, and nri-plugin is one of the modules of my service.
In nri-plugin I did not use WithOnClose() to set the exit callback function, I would rebuild a new stub when exiting. I found that when the stup exception occurred, the daemon service main was actually exited. I felt that it was very unfriendly to exit the deamon service due to a goroutine exception.
After checking, it is found that there is a problem with the following code:

// Handle a lost connection.
func (stub *stub) connClosed() {
	stub.Lock()
	stub.close()
	stub.Unlock()
	if stub.onClose != nil {
		stub.onClose()
		return
	}

	os.Exit(0) // exits the process
}

os.Exit(0) exits the process,for the nri module package, using runtime.Goexit() is more elegant. runtime.Goexit() will only exit the current goroutine, not the daemon process.

Signed-off-by: attlee-wang <wangdong102425@gmail.com>
@attlee-wang
Copy link
Author

@klihub @mikebrow @fuweid can anyone help to review this ? thanks.

@klihub
Copy link
Member

klihub commented Sep 10, 2024

My service is a multi-goroutine daemon service, and nri-plugin is one of the modules of my service. In nri-plugin I did not use WithOnClose() to set the exit callback function, I would rebuild a new stub when exiting.

@attlee-wang Not directly related to this PR, but it might be useful for you: ever since we have merged PR #91, it became possible to restart a stub by using a Stop() followed by a Start().

I found that when the stup exception occurred, the daemon service main was actually exited. I felt that it was very unfriendly to exit the deamon service due to a goroutine exception. After checking, it is found that there is a problem with the following code:
...

This behavior has been discussed and criticized earlier a few times, and it is fair to say that not without a reason. I have a few question/comments related to this.

  1. This original, and admittedly a bit harshly/hastily implemented behavior, was partly in place to make pre-connected/runtime-launched plugins exit once they lose their pre-connected transport (a socketpair() which cannot be externally re-established). It would be nice and also fairly easily possible to retain this behavior. We could either save the fact in the original connection setup/adoption code that we took a socketpair and then use that in the connection lost handler, or we could simply re-check the same env. var and use its existence for the same.
  2. Cannot you work around this immediate problem in your current code by simply creating your stub with a WithOnClose() handler to prevent the stub internally doing a os.Exit(0) ? Your handler could either do nothing or, for instance using a channel, signal your upper layer logic to recreate (or Stop()/Start()) the stub, if you are neither running it with stub.Run() nor waiting for its termination with stub.Wait() ?

@@ -543,7 +544,7 @@ func (stub *stub) connClosed() {
return
}

os.Exit(0)
runtime.Goexit() //The exit code can be determined when the coroutine exits
Copy link
Member

@klihub klihub Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why a runtime.Goexit() would be better here than not doing anything (other than maybe recording the fact that we lost connection) and simply returning instead? If the stub has been started by stub.Start() or stub.Run(), it should be sitting in ttrpc.Server.Serve() which then should (soon) return once we return from here. That in turn should cause stub.Run() to return, if that is what the caller used to start the stub, or otherwise stub.Wait() to return if that is what the caller uses to wait for the stub to stop.

I think what we could do here is only call os.Exit() if this is a pre-connected plugin. So,

  • either record in connect() the fact that we've taken a socketpair from the env., and
  • use that to decide to os.Exit() here, or check here for the presence of the same env. var and simply use that fact to exit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I think something like this could work, too:

diff --git a/pkg/stub/stub.go b/pkg/stub/stub.go
index 88fd2e1..b68344f 100644
--- a/pkg/stub/stub.go
+++ b/pkg/stub/stub.go
@@ -502,6 +502,15 @@ func (stub *stub) connect() error {
                        return fmt.Errorf("invalid socket (%d) in environment: %w", fd, err)
                }
 
+               userOnClose := stub.onClose
+               stub.onClose = func() {
+                       if userOnClose != nil {
+                               userOnClose()
+                       }
+                       log.Infof(noCtx, "Connection from environment lost, exiting...")
+                       os.Exit(0)
+               }
+
                return nil
        } 

@klihub
Copy link
Member

klihub commented Sep 11, 2024

ping @attlee-wang

@attlee-wang
Copy link
Author

attlee-wang commented Sep 24, 2024

  1. Cannot you work around this immediate problem in your current code by simply creating your stub with a WithOnClose() handler to prevent the stub internally doing a os.Exit(0) ? Your handler could either do nothing or, for instance using a channel, signal your upper layer logic to recreate (or Stop()/Start()) the stub, if you are neither running it with stub.Run() nor waiting for its termination with stub.Wait() ?

@klihub It's not that I don't use WithOnClose(). I looked at the stub.go package and exist stub.Stop(). I subjectively thought that stub.Stop() could be used. After all, WithOnClose() is just an optional function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants