Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible leak of net.Conn #4632

Closed
aaronbee opened this issue Jul 30, 2021 · 4 comments · Fixed by #4633
Closed

Possible leak of net.Conn #4632

aaronbee opened this issue Jul 30, 2021 · 4 comments · Fixed by #4633

Comments

@aaronbee
Copy link

What version of gRPC are you using?

Go Server using v1.39.0 also tried #4627 in case that fixed the issue.

v1.38.0 does not exhibit this issue.

What version of Go are you using (go version)?

go1.16.6

What operating system (Linux, Windows, …) and version?

GOOS=linux GOARCH=386
Linux 4.19.134

What did you do?

If possible, provide a recipe for reproducing the error.

I haven't quite narrowed this down yet. We have a test that involves a long running python gRPC client using an grpc-core v1.21.3 that makes several requests.

Importantly, our server is using https://pkg.go.dev/golang.org/x/net/netutil#LimitListener to limit to 100 concurrent connections.

What did you expect to see?

Clients able to connect and make requests. Networks connections are cleaned up.

What did you see instead?

After some time running with the python client described above, it and other grpc clients can no longer connect and make RPCs. The stack trace shows that the gRPC Server is stuck in LimitListener acquire when calling Accept. This function is intended to block when over the configured number of concurrent connections allowed. However, there is only one client and the limit is set to 100.

goroutine 1478 [select, 9 minutes]:
golang.org/x/net/netutil.(*limitListener).acquire(0x118549a0, 0x8085bad)
        golang.org/x/net@v0.0.0-20210716203947-853a461950ff/netutil/listen.go:35 +0x84
golang.org/x/net/netutil.(*limitListener).Accept(0x118549a0, 0x90817cc, 0x10c2be60, 0x91fd100, 0x118549a0)
        golang.org/x/net@v0.0.0-20210716203947-853a461950ff/netutil/listen.go:45 +0x25
google.golang.org/grpc.(*Server).Serve(0x10c2be60, 0x91fd100, 0x118549a0, 0x0, 0x0)
        google.golang.org/grpc@v1.39.0/server.go:786 +0x1bb

This suggests to me that connections that are returned from Accept() are not always having their Close() function called, ie. a leak. That would explain why the LimitListener thinks it is at max connections, but really there is only 1 outstanding connection. I can confirm this with output from ss that shows only 1 connection open to the grpc server's port.

I added some logging to the limit listener when a connection is Accepted and when it is Closed, along with a count of how many outstanding connections there are:

I0730 12:17:02.280037     679 listen.go:57] limit accepted connection count:1
I0730 12:17:03.384298     679 listen.go:57] limit accepted connection count:2
I0730 12:17:04.284599     679 listen.go:57] limit accepted connection count:3
I0730 12:17:07.026731     679 listen.go:76] closed connection
I0730 12:17:07.026823     679 listen.go:76] closed connection
I0730 12:17:09.962779     679 listen.go:57] limit accepted connection count:3
I0730 12:17:09.968442     679 listen.go:76] closed connection
I0730 12:17:09.968483     679 listen.go:76] closed connection
I0730 12:17:10.483822     679 listen.go:57] limit accepted connection count:3
I0730 12:17:11.835116     679 listen.go:76] closed connection
I0730 12:17:11.835129     679 listen.go:76] closed connection
I0730 12:17:13.598692     679 listen.go:57] limit accepted connection count:3
I0730 12:17:13.604379     679 listen.go:76] closed connection
I0730 12:17:13.604408     679 listen.go:76] closed connection
I0730 12:17:14.084776     679 listen.go:57] limit accepted connection count:3
I0730 12:17:15.516137     679 listen.go:76] closed connection
I0730 12:17:15.516203     679 listen.go:76] closed connection
I0730 12:17:17.299756     679 listen.go:57] limit accepted connection count:3
I0730 12:17:17.300481     679 listen.go:76] closed connection
I0730 12:17:17.300505     679 listen.go:76] closed connection
I0730 12:17:17.806370     679 listen.go:57] limit accepted connection count:3
I0730 12:17:19.272452     679 listen.go:76] closed connection
I0730 12:17:19.272460     679 listen.go:76] closed connection
I0730 12:17:21.094221     679 listen.go:57] limit accepted connection count:3
I0730 12:17:21.576549     679 listen.go:57] limit accepted connection count:4
I0730 12:17:23.042133     679 listen.go:76] closed connection
I0730 12:17:23.042191     679 listen.go:76] closed connection
...
I0730 12:31:28.155942     679 listen.go:57] limit accepted connection count:98
I0730 12:31:29.612739     679 listen.go:76] closed connection
I0730 12:31:29.612765     679 listen.go:76] closed connection
I0730 12:31:32.161436     679 listen.go:57] limit accepted connection count:98
I0730 12:31:32.660091     679 listen.go:57] limit accepted connection count:99
I0730 12:31:34.178509     679 listen.go:76] closed connection
I0730 12:31:34.178520     679 listen.go:76] closed connection
I0730 12:31:36.137405     679 listen.go:57] limit accepted connection count:99
I0730 12:31:36.137789     679 listen.go:76] closed connection
I0730 12:31:36.137814     679 listen.go:76] closed connection
I0730 12:31:36.657023     679 listen.go:57] limit accepted connection count:99
I0730 12:31:38.164485     679 listen.go:76] closed connection
I0730 12:31:38.164521     679 listen.go:76] closed connection
I0730 12:31:40.039548     679 listen.go:57] limit accepted connection count:99
I0730 12:31:40.558325     679 listen.go:57] limit accepted connection count:100
I0730 12:31:42.082656     679 listen.go:76] closed connection
I0730 12:31:42.082710     679 listen.go:76] closed connection
I0730 12:31:43.979680     679 listen.go:57] limit accepted connection count:100
I0730 12:31:43.980000     679 listen.go:76] closed connection
I0730 12:31:43.980025     679 listen.go:76] closed connection
I0730 12:31:44.549125     679 listen.go:57] limit accepted connection count:100
I0730 12:31:45.975878     679 listen.go:76] closed connection
I0730 12:31:45.975915     679 listen.go:76] closed connection
I0730 12:31:47.935959     679 listen.go:57] limit accepted connection count:100

The increasing count shows a pattern of an occasional leak of a connection. (When the connection is closed, it appears that Close is called twice.)

@aaronbee
Copy link
Author

I tried an experiment of adding a runtime.SetFinalizer call for the conns returned by LimitListener.Accept and this appears to resolve the issue.

This strongly suggests to me that there is a code path in the grpc server where a net.Conn returned from Accept is dropped without ever calling Close. This issue can be worked around if there is a Finalizer installed on the net.Conn, which appears to be the case for the conns returned from the standard net.Listen.

@menghanl
Copy link
Contributor

Could be caused by #4458.
Can you try #4633?

@aaronbee
Copy link
Author

It is working with #4633. Thanks!

@menghanl
Copy link
Contributor

menghanl commented Aug 5, 2021

https://github.com/grpc/grpc-go/releases/tag/v1.39.1 published with this fixed

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants