Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: benchmark panics #550

Merged
merged 1 commit into from
May 1, 2015
Merged

nsqd: benchmark panics #550

merged 1 commit into from
May 1, 2015

Conversation

mreiferson
Copy link
Member

In particular, BenchmarkProtocolV2Pub32k:

$ go test -bench=ProtocolV2Pub32k -run=XXX                                                                                                                                                                                      !
PASS
BenchmarkProtocolV2Pub32k       panic: write tcp 127.0.0.1:55383: connection reset by peer

goroutine 26 [running]:
github.com/bitly/nsq/nsqd.func·051()
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/protocol_v2_test.go:1427 +0x36e
created by github.com/bitly/nsq/nsqd.benchmarkProtocolV2Pub
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/protocol_v2_test.go:1443 +0x68d

goroutine 1 [chan receive]:
testing.(*B).run(0xc208070400, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/caleb/apps/go/src/testing/benchmark.go:180 +0x67
testing.RunBenchmarks(0x9544a8, 0xaa0b80, 0x2a, 0x2a)
        /home/caleb/apps/go/src/testing/benchmark.go:312 +0x649
testing.(*M).Run(0xc20804e410, 0xaad140)
        /home/caleb/apps/go/src/testing/testing.go:494 +0x1a5
main.main()
        github.com/bitly/nsq/nsqd/_test/_testmain.go:264 +0x1d5

goroutine 22 [select]:
github.com/bitly/nsq/nsqd.(*NSQD).idPump(0xc208000ea0)
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/nsqd.go:497 +0x388
github.com/bitly/nsq/nsqd.func·030()
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/nsqd.go:132 +0x2a
github.com/bitly/nsq/util.func·003()
        /home/caleb/p/go/src/github.com/bitly/nsq/util/wait_group_wrapper.go:14 +0x2f
created by github.com/bitly/nsq/util.(*WaitGroupWrapper).Wrap
        /home/caleb/p/go/src/github.com/bitly/nsq/util/wait_group_wrapper.go:16 +0xf7

goroutine 6 [semacquire]:
sync.(*WaitGroup).Wait(0xc20801c000)
        /home/caleb/apps/go/src/sync/waitgroup.go:132 +0x169
github.com/bitly/nsq/nsqd.benchmarkProtocolV2Pub(0xc208070400, 0x8000)
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/protocol_v2_test.go:1446 +0x6ec
github.com/bitly/nsq/nsqd.BenchmarkProtocolV2Pub32k(0xc208070400)
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/protocol_v2_test.go:1459 +0x31
testing.(*B).runN(0xc208070400, 0x2710)
        /home/caleb/apps/go/src/testing/benchmark.go:124 +0x95
testing.(*B).launch(0xc208070400)
        /home/caleb/apps/go/src/testing/benchmark.go:216 +0x159
created by testing.(*B).run
        /home/caleb/apps/go/src/testing/benchmark.go:179 +0x3e

goroutine 23 [select]:
github.com/bitly/nsq/nsqd.(*NSQD).lookupLoop(0xc208000ea0)
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/lookup.go:64 +0x3008
github.com/bitly/nsq/nsqd.func·031()
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/nsqd.go:185 +0x2a
github.com/bitly/nsq/util.func·003()
        /home/caleb/p/go/src/github.com/bitly/nsq/util/wait_group_wrapper.go:14 +0x2f
created by github.com/bitly/nsq/util.(*WaitGroupWrapper).Wrap
        /home/caleb/p/go/src/github.com/bitly/nsq/util/wait_group_wrapper.go:16 +0xf7

goroutine 25 [IO wait]:
net.(*pollDesc).Wait(0xc208012530, 0x72, 0x0, 0x0)
        /home/caleb/apps/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc208012530, 0x0, 0x0)
        /home/caleb/apps/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc2080124d0, 0x0, 0x7fcbec61ec08, 0xc20808c828)
        /home/caleb/apps/go/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc20804c0d8, 0xc20801e380, 0x0, 0x0)
        /home/caleb/apps/go/src/net/tcpsock_posix.go:234 +0x4e
net.(*TCPListener).Accept(0xc20804c0d8, 0x0, 0x0, 0x0, 0x0)
        /home/caleb/apps/go/src/net/tcpsock_posix.go:244 +0x4c
net/http.(*Server).Serve(0xc2080404e0, 0x7fcbec61f6c8, 0xc20804c0d8, 0x0, 0x0)
        /home/caleb/apps/go/src/net/http/server.go:1728 +0x92
github.com/bitly/nsq/util.HTTPServer(0x7fcbec61f6c8, 0xc20804c0d8, 0x7fcbec620800, 0xc20808c660, 0x7fcbec620760, 0xc20808c300, 0x86be30, 0x4)
        /home/caleb/p/go/src/github.com/bitly/nsq/util/http_server.go:16 +0x26b
github.com/bitly/nsq/nsqd.func·034()
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/nsqd.go:226 +0xf6
github.com/bitly/nsq/util.func·003()
        /home/caleb/p/go/src/github.com/bitly/nsq/util/wait_group_wrapper.go:14 +0x2f
created by github.com/bitly/nsq/util.(*WaitGroupWrapper).Wrap
        /home/caleb/p/go/src/github.com/bitly/nsq/util/wait_group_wrapper.go:16 +0xf7

goroutine 24 [IO wait]:
net.(*pollDesc).Wait(0xc208012450, 0x72, 0x0, 0x0)
        /home/caleb/apps/go/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc208012450, 0x0, 0x0)
        /home/caleb/apps/go/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).accept(0xc2080123f0, 0x0, 0x7fcbec61ec08, 0xc20808d238)
        /home/caleb/apps/go/src/net/fd_unix.go:419 +0x40b
net.(*TCPListener).AcceptTCP(0xc20804c0a0, 0xc208017668, 0x0, 0x0)
        /home/caleb/apps/go/src/net/tcpsock_posix.go:234 +0x4e
net.(*TCPListener).Accept(0xc20804c0a0, 0x0, 0x0, 0x0, 0x0)
        /home/caleb/apps/go/src/net/tcpsock_posix.go:244 +0x4c
github.com/bitly/nsq/util.TCPServer(0x7fcbec61f6c8, 0xc20804c0a0, 0x7fcbec620738, 0xc20804c0b8, 0x7fcbec620760, 0xc20808c300)
        /home/caleb/p/go/src/github.com/bitly/nsq/util/tcp_server.go:18 +0x184
github.com/bitly/nsq/nsqd.func·032()
        /home/caleb/p/go/src/github.com/bitly/nsq/nsqd/nsqd.go:195 +0xe0
github.com/bitly/nsq/util.func·003()
        /home/caleb/p/go/src/github.com/bitly/nsq/util/wait_group_wrapper.go:14 +0x2f
created by github.com/bitly/nsq/util.(*WaitGroupWrapper).Wrap
        /home/caleb/p/go/src/github.com/bitly/nsq/util/wait_group_wrapper.go:16 +0xf7
exit status 2
FAIL    github.com/bitly/nsq/nsqd       0.023s

@mreiferson
Copy link
Member

that's pretty funny, I get this when I run that benchmark:

panic: write tcp 127.0.0.1:60776: protocol wrong type for socket

I don't even know what that means...

This is going to be a fun one!

@mreiferson
Copy link
Member

@cespare relatedly, I suggest you use bench.sh (local) or bench/bench.py (distributed) to run benchmarks

@mreiferson
Copy link
Member

(but I'd like to understand this so I'm leaving this open)

@cespare
Copy link
Contributor Author

cespare commented Feb 25, 2015

Yes, it works with bench.sh. Fine by me to close whenever.

@mreiferson
Copy link
Member

In this case bench.sh isn't running that benchmark, it's standing up a local nsqd and running some benchmarking utility apps (under /bench/bench_{reader,writer}).

So there is some underlying problem here - I just want to confirm that it isn't a larger issue in the core with bigger message sizes.

@mreiferson
Copy link
Member

I did a little more digging and we appear to be experiencing this bug:

http://erickt.github.io/blog/2014/11/19/adventures-in-debugging-a-potential-osx-kernel-bug/

I assume you're running on OSX Yosemite as well?

@cespare
Copy link
Contributor Author

cespare commented Feb 28, 2015

Nope, Linux.

@mreiferson
Copy link
Member

OK - re-reading your paste you're getting a different error than I am.

@mreiferson
Copy link
Member

@cespare - finally setup a VM to take a look at this, it's far more benign than I had originally feared:

$ go test -v -run None -bench BenchmarkProtocolV2Pub32k
PASS
BenchmarkProtocolV2Pub32k   2015/05/01 16:47:40.946475 nsqd v0.3.6-alpha (built w/go1.2.1)
2015/05/01 16:47:40.946593 ID: 339
2015/05/01 16:47:40.946777 TCP: listening on 127.0.0.1:51675
2015/05/01 16:47:40.946927 HTTP: listening on 127.0.0.1:60993
2015/05/01 16:47:40.960602 TCP: new client(127.0.0.1:45178)
2015/05/01 16:47:40.961222 NSQ: persisting topic/channel metadata to /tmp/nsqd.339.dat
2015/05/01 16:47:40.963125 NSQ: closing topics
2015/05/01 16:47:40.963323 CLIENT(127.0.0.1:45178): desired protocol magic '  V2'
2015/05/01 16:47:40.963407 TCP: closing 127.0.0.1:51675
2015/05/01 16:47:40.964008 HTTP: closing 127.0.0.1:60993
2015/05/01 16:47:40.964194 LOOKUP: closing
2015/05/01 16:47:40.964322 ID: closing
2015/05/01 16:47:40.965824 nsqd v0.3.6-alpha (built w/go1.2.1)
2015/05/01 16:47:40.965954 ID: 339
2015/05/01 16:47:40.966325 TCP: listening on 127.0.0.1:50222
2015/05/01 16:47:40.966377 HTTP: listening on 127.0.0.1:52548
2015/05/01 16:47:40.978045 TCP: new client(127.0.0.1:38814)
2015/05/01 16:47:40.978282 NSQ: persisting topic/channel metadata to /tmp/nsqd.339.dat
2015/05/01 16:47:40.980014 NSQ: closing topics
2015/05/01 16:47:40.980124 CLIENT(127.0.0.1:38814): desired protocol magic '  V2'
2015/05/01 16:47:40.980319 TCP: closing 127.0.0.1:50222
2015/05/01 16:47:40.980396 HTTP: closing 127.0.0.1:52548
2015/05/01 16:47:40.980590 ID: closing
2015/05/01 16:47:40.980649 LOOKUP: closing
2015/05/01 16:47:40.981488 nsqd v0.3.6-alpha (built w/go1.2.1)
2015/05/01 16:47:40.981925 ID: 339
2015/05/01 16:47:40.982137 TCP: listening on 127.0.0.1:59991
2015/05/01 16:47:40.982256 HTTP: listening on 127.0.0.1:45002
2015/05/01 16:47:40.995376 TCP: new client(127.0.0.1:38776)
2015/05/01 16:47:40.996186 CLIENT(127.0.0.1:38776): desired protocol magic '  V2'
2015/05/01 16:47:41.000544 ERROR: [127.0.0.1:38776] - E_BAD_BODY MPUB body too big 6554404 > 5123840
2015/05/01 16:47:41.000790 PROTOCOL(V2): [127.0.0.1:38776] exiting ioloop
2015/05/01 16:47:41.000904 PROTOCOL(V2): [127.0.0.1:38776] exiting messagePump
panic: write tcp 127.0.0.1:59991: connection reset by peer

We're sending too large a body in these tests. I'll fix!

@mreiferson
Copy link
Member

RFR @jehiah

jehiah added a commit that referenced this pull request May 1, 2015
@jehiah jehiah merged commit a2a5938 into nsqio:master May 1, 2015
@mreiferson mreiferson deleted the bench_panic_550 branch May 1, 2015 17:34
@mreiferson mreiferson mentioned this pull request Jul 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants