Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance go leak check #7782

Open
HuSharp opened this issue Feb 1, 2024 · 0 comments · May be fixed by #7777
Open

Enhance go leak check #7782

HuSharp opened this issue Feb 1, 2024 · 0 comments · May be fixed by #7777
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@HuSharp
Copy link
Member

HuSharp commented Feb 1, 2024

Enhancement Task

for example, we will meet goroutine leak which top stack is runtime_pollWait which resulted from dashboard

Goroutine 1362 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
goroutine 1362 [IO wait]:
internal/poll.runtime_pollWait(0x14dc55908, 0x72)
	/opt/homebrew/opt/go/libexec/src/runtime/netpoll.go:343 +0xa0
internal/poll.(*pollDesc).wait(0x14008d8e580?, 0x0?, 0x0)
	/opt/homebrew/opt/go/libexec/src/internal/poll/fd_poll_runtime.go:84 +0x28
internal/poll.(*pollDesc).waitRead(...)
	/opt/homebrew/opt/go/libexec/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x14008d8e580)
	/opt/homebrew/opt/go/libexec/src/internal/poll/fd_unix.go:611 +0x250
net.(*netFD).accept(0x14008d8e580)
	/opt/homebrew/opt/go/libexec/src/net/fd_unix.go:172 +0x28
net.(*TCPListener).accept(0x140088d1840)
	/opt/homebrew/opt/go/libexec/src/net/tcpsock_posix.go:152 +0x28
net.(*TCPListener).Accept(0x140088d1840)
	/opt/homebrew/opt/go/libexec/src/net/tcpsock.go:315 +0x2c
github.com/pingcap/tidb-dashboard/pkg/tidb.(*proxy).run(0x14008d9e1e0, {0x104f79778?, 0x14007fd5c20})
	/Users/pingcap/go/pkg/mod/github.com/pingcap/tidb-dashboard@v0.0.0-20240111062855-41f7c8011953/pkg/tidb/proxy.go:227 +0x37c
created by github.com/pingcap/tidb-dashboard/pkg/tidb.(*Forwarder).Start in goroutine 1296
	/Users/pingcap/go/pkg/mod/github.com/pingcap/tidb-dashboard@v0.0.0-20240111062855-41f7c8011953/pkg/tidb/forwarder.go:57 +0x1d4

besides we will meet goroutine leak error which top stack is runtime_pollWait as well, but the root case is go.etcd.io/etcd/pkg/transport.timeoutConn.Read which is different with dashboard

internal/poll.runtime_pollWait(0x10f8cc2a0, 0x72)
	/opt/homebrew/opt/go/libexec/src/runtime/netpoll.go:343 +0xa0
internal/poll.(*pollDesc).wait(0x140016ac400?, 0x1400159a000?, 0x0)
	/opt/homebrew/opt/go/libexec/src/internal/poll/fd_poll_runtime.go:84 +0x28
internal/poll.(*pollDesc).waitRead(...)
	/opt/homebrew/opt/go/libexec/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x140016ac400, {0x1400159a000, 0x1000, 0x1000})
	/opt/homebrew/opt/go/libexec/src/internal/poll/fd_unix.go:164 +0x200
net.(*netFD).Read(0x140016ac400, {0x1400159a000?, 0x14000436160?, 0x140013dcc60?})
	/opt/homebrew/opt/go/libexec/src/net/fd_posix.go:55 +0x28
net.(*conn).Read(0x140016b81c0, {0x1400159a000?, 0x14001141bd8?, 0x1?})
	/opt/homebrew/opt/go/libexec/src/net/net.go:179 +0x34
go.etcd.io/etcd/pkg/transport.timeoutConn.Read({{0x106cf7120?, 0x140016b81c0?}, 0x14001141c78?, 0x1041334cc?}, {0x1400159a000?, 0x104133074?, 0x1400039b068?})
	/Users/pingcap/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20220915004622-85b640cee793/pkg/transport/timeout_conn.go:43 +0xa8
net/http.(*persistConn).Read(0x140013dcc60, {0x1400159a000?, 0x104133570?, 0x140013caf00?})
	/opt/homebrew/opt/go/libexec/src/net/http/transport.go:1954 +0x50
bufio.(*Reader).fill(0x14001073d40)
	/opt/homebrew/opt/go/libexec/src/bufio/bufio.go:113 +0xf8
bufio.(*Reader).Peek(0x14001073d40, 0x1)
	/opt/homebrew/opt/go/libexec/src/bufio/bufio.go:151 +0x60
net/http.(*persistConn).readLoop(0x140013dcc60)
	/opt/homebrew/opt/go/libexec/src/net/http/transport.go:2118 +0x14c
created by net/http.(*Transport).dialConn in goroutine 316
	/opt/homebrew/opt/go/libexec/src/net/http/transport.go:1776 +0x1144

When we use top stack to ignore, it results in dashboard errors not being exposed.
A better way to treat these two issues is to:

  • wait for the etcd timeout (because the goroutine is still exiting)
  • Troubleshooting the dashboard
    So we need a more fine-grained to check
@HuSharp HuSharp added the type/enhancement The issue or PR belongs to an enhancement. label Feb 1, 2024
ti-chi-bot bot pushed a commit that referenced this issue Feb 6, 2024
ref #7782

etcdutil_test: close goroutine leak when block in accept

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: JmPotato <ghzpotato@gmail.com>
ti-chi-bot bot added a commit that referenced this issue Feb 6, 2024
ref #7782

server: close grpc conn when close server

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant