Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addworker notification for preventing goroutines leak #894

Merged
merged 18 commits into from
Dec 11, 2024

Conversation

manc88
Copy link
Contributor

@manc88 manc88 commented Oct 18, 2024

Hi, came across the fact that many of our applications have started to notice goroutine leaks. Found out that
the reason is that we initialize the HTTPSyncTransport object many times in so-called handlers:

github.com/getsentry/sentry-go.(*HTTPTransport).worker(0xc0000000)

goroutine t.worker leaks in our case.
Using blocking HTTPSyncTransport is not beneficial for us and spoils latency.

You can take a look at the solution, thanks to which we managed to reduce the number of running goroutines by orders of magnitude!

Seems like it could be a solution for 731 issue

Thanks for your product anyway.

Mikhail Zvonov and others added 2 commits October 18, 2024 21:17
Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 80.55556% with 7 lines in your changes missing coverage. Please review.

Project coverage is 83.87%. Comparing base (71717ce) to head (a598eec).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
transport.go 85.29% 4 Missing and 1 partial ⚠️
client.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #894      +/-   ##
==========================================
+ Coverage   83.81%   83.87%   +0.06%     
==========================================
  Files          49       49              
  Lines        5121     5135      +14     
==========================================
+ Hits         4292     4307      +15     
+ Misses        676      674       -2     
- Partials      153      154       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ribice
Copy link
Collaborator

ribice commented Nov 22, 2024

I'm testing this locally and tried a couple of my own solutions as well, but there's still a leaked goroutine when HTTPTransport is flushed.

func TestHTTPTransportDoesntLeakGoroutines(t *testing.T) {
	defer goleak.VerifyNone(t)

	fmt.Println("NumGoroutine before sending events:", runtime.NumGoroutine())

	transport := NewHTTPTransport()
	transport.Configure(ClientOptions{
		Dsn:        "https://test@foobar/1",
		HTTPClient: http.DefaultClient,
	})

	// Add logging to monitor worker termination
	fmt.Println("Starting Flush")
	if !transport.Flush(testutils.FlushTimeout()) {
		t.Error("Flush failed")
	}
	fmt.Println("Flush completed")

	fmt.Println("NumGoroutine after sending events:", runtime.NumGoroutine())
}

Both goleak detect this (worker remains running) and the number of goroutines (2 before and 3 after Flush).

@manc88
Copy link
Contributor Author

manc88 commented Nov 22, 2024

I'm testing this locally and tried a couple of my own solutions as well, but there's still a leaked goroutine when HTTPTransport is flushed.

...

Both goleak detect this (worker remains running) and the number of goroutines (2 before and 3 after Flush).

Yes, I assumed that when calling the Flush() method we should not interrupt the worker, because it should continue to do its own things. And we can give an opportunity to notify transport from outside the package that we don't need it anymore. something like that:

func TestHTTPTransportDoesntLeakGoroutines(t *testing.T) {
	defer goleak.VerifyNone(t)

	fmt.Println("NumGoroutine before sending events:", runtime.NumGoroutine())

	doneCh := make(chan struct{})
	transport := NewHTTPTransport()
	transport.Configure(ClientOptions{
		Dsn:        "https://test@foobar/1",
		HTTPClient: http.DefaultClient,
		Done: doneCh,
	})

	doneCh <- struct{}{} //comment this line to catch a leak

	fmt.Println("NumGoroutine after sending events:", runtime.NumGoroutine())
}

Another option is to hide this logic behind a new method HTTPTransport.Close() but it seems unnecessary.

@ribice
Copy link
Collaborator

ribice commented Nov 26, 2024

This makes sense. The only thing I'm unsure about is whether we should pre-set the done channel and expose a Close() or End() method on HTTPTransport. That seems more readable than users having to know that they need to create, pass and write to channel to stop the worker?

@manc88
Copy link
Contributor Author

manc88 commented Nov 26, 2024

That seems more readable than users having to know that they need to create, pass and write to channel to stop the worker?

yeah, looks a robust and easy to understand with Close method. thats ill gonna do.

@ribice
Copy link
Collaborator

ribice commented Nov 28, 2024

I added changes to changelog and the leak detection test. Locally passes fine, here it reports there is a leak. Will further debug why.

@cleptric
Copy link
Member

I think we should also add a method to the client, similar to other SDKs that provide a close method.

@manc88
Copy link
Contributor Author

manc88 commented Nov 28, 2024

I added changes to changelog and the leak detection test. Locally passes fine, here it reports there is a leak. Will further debug why.

upd. oh i see not pased tests

@manc88
Copy link
Contributor Author

manc88 commented Nov 28, 2024

Add Close method for client and fix tests, there are problems in other tests where transport was not closed properly. And there was not only worker leaks btw.

@ribice
Copy link
Collaborator

ribice commented Dec 9, 2024

@manc88 Anything else you'd like to add? If not ,we can merge this.

@manc88
Copy link
Contributor Author

manc88 commented Dec 9, 2024

@manc88 Anything else you'd like to add? If not ,we can merge this.

@ribice nothig else, thanks

@cleptric
Copy link
Member

I'm merging this now and will follow up adding support for a timeout similar to flush. Thanks for your work @manc88!

@cleptric cleptric merged commit 3e6309a into getsentry:master Dec 11, 2024
15 of 16 checks passed
@manc88
Copy link
Contributor Author

manc88 commented Dec 11, 2024

@cleptric @ribice thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants