Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: BenchmarkDiskQueuePut prints errors #619

Merged
merged 1 commit into from
Aug 7, 2015

Conversation

jehiah
Copy link
Member

@jehiah jehiah commented Aug 7, 2015

current master (fd62604)

go test -bench=QueuePut
2015/08/06 22:54:51 http: TLS handshake error from 127.0.0.1:58649: tls: client didn't provide a certificate
2015/08/06 22:54:51 LOOKUPD: querying http://127.0.0.1:36858/channels?topic=cluster_test1438894491
PASS
BenchmarkDiskQueuePut    1000000          1890 ns/op     541.51 MB/s
--- BENCH: BenchmarkDiskQueuePut
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438894496): writeOne() opened /tmp/nsq-test-1438894496364087961502926977/bench_disk_queue_put11438894496.diskqueue.000000.dat
    nsqd_test.go:57: ERROR: reading from diskqueue(bench_disk_queue_put11438894496) at 0 of /tmp/nsq-test-1438894496364087961502926977/bench_disk_queue_put11438894496.diskqueue.000000.dat - open /tmp/nsq-test-1438894496364087961502926977/bench_disk_queue_put11438894496.diskqueue.000000.dat: no such file or directory
    nsqd_test.go:57: NOTICE: diskqueue(bench_disk_queue_put11438894496) jump to next file and saving bad file as /tmp/nsq-test-1438894496364087961502926977/bench_disk_queue_put11438894496.diskqueue.000000.dat.bad
    nsqd_test.go:57: ERROR: diskqueue(bench_disk_queue_put11438894496) failed to rename bad diskqueue file /tmp/nsq-test-1438894496364087961502926977/bench_disk_queue_put11438894496.diskqueue.000000.dat to /tmp/nsq-test-1438894496364087961502926977/bench_disk_queue_put11438894496.diskqueue.000000.dat.bad
    nsqd_test.go:57: ERROR: diskqueue(bench_disk_queue_put11438894496) failed to sync - open /tmp/nsq-test-1438894496364087961502926977/bench_disk_queue_put11438894496.diskqueue.meta.dat.9207838268526566190.tmp: no such file or directory
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438894496): writeOne() opened /tmp/nsq-test-1438894496365062063838665708/bench_disk_queue_put1001438894496.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438894496): readOne() opened /tmp/nsq-test-1438894496365062063838665708/bench_disk_queue_put1001438894496.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438894496): writeOne() opened /tmp/nsq-test-1438894496366220971090229851/bench_disk_queue_put100001438894496.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438894496): readOne() opened /tmp/nsq-test-1438894496366220971090229851/bench_disk_queue_put100001438894496.diskqueue.000000.dat
    nsqd_test.go:57: ERROR: diskqueue(bench_disk_queue_put100001438894496) failed to sync - open /tmp/nsq-test-1438894496366220971090229851/bench_disk_queue_put100001438894496.diskqueue.meta.dat.8974393837878367304.tmp: no such file or directory
    ... [output truncated]
ok      github.com/bitly/nsq/nsqd   9.703s

@mreiferson
Copy link
Member

I can't reproduce this, are you sure there's nothing funky about the environment you're running this on?

@mreiferson mreiferson changed the title BenchmarkDiskQueuePut prints errors nsqd: BenchmarkDiskQueuePut prints errors Aug 7, 2015
@Dieterbe
Copy link
Contributor Author

Dieterbe commented Aug 7, 2015

hmm, so i thought maybe /tmp was filing up but nope watch -n 0.1 df -h /tmp/ revealed usage went up to about 1GB and then the test breaks like above. (my /tmp is a tmpfs with 16GB available)

i also tried running journalctl -f during the test, nothing came up.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Aug 7, 2015

just did another run which looked like so:

dieter@dieter-m6800 nsqd go test -bench=QueuePut
2015/08/07 09:20:24 http: TLS handshake error from 127.0.0.1:45343: tls: client didn't provide a certificate
2015/08/07 09:20:24 LOOKUPD: querying http://127.0.0.1:45389/channels?topic=cluster_test1438932024
PASS
BenchmarkDiskQueuePut    1000000          1883 ns/op     543.81 MB/s
--- BENCH: BenchmarkDiskQueuePut
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438932029): writeOne() opened /tmp/nsq-test-1438932029313220882000012988/bench_disk_queue_put11438932029.diskqueue.000000.dat
    nsqd_test.go:57: ERROR: reading from diskqueue(bench_disk_queue_put11438932029) at 0 of /tmp/nsq-test-1438932029313220882000012988/bench_disk_queue_put11438932029.diskqueue.000000.dat - open /tmp/nsq-test-1438932029313220882000012988/bench_disk_queue_put11438932029.diskqueue.000000.dat: no such file or directory
    nsqd_test.go:57: NOTICE: diskqueue(bench_disk_queue_put11438932029) jump to next file and saving bad file as /tmp/nsq-test-1438932029313220882000012988/bench_disk_queue_put11438932029.diskqueue.000000.dat.bad
    nsqd_test.go:57: ERROR: diskqueue(bench_disk_queue_put11438932029) failed to rename bad diskqueue file /tmp/nsq-test-1438932029313220882000012988/bench_disk_queue_put11438932029.diskqueue.000000.dat to /tmp/nsq-test-1438932029313220882000012988/bench_disk_queue_put11438932029.diskqueue.000000.dat.bad
    nsqd_test.go:57: ERROR: diskqueue(bench_disk_queue_put11438932029) failed to sync - open /tmp/nsq-test-1438932029313220882000012988/bench_disk_queue_put11438932029.diskqueue.meta.dat.3238838949533235380.tmp: no such file or directory
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438932029): writeOne() opened /tmp/nsq-test-1438932029314142315200953579/bench_disk_queue_put1001438932029.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438932029): readOne() opened /tmp/nsq-test-1438932029314142315200953579/bench_disk_queue_put1001438932029.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438932029): writeOne() opened /tmp/nsq-test-1438932029315307917260011854/bench_disk_queue_put100001438932029.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438932029): readOne() opened /tmp/nsq-test-1438932029315307917260011854/bench_disk_queue_put100001438932029.diskqueue.000000.dat
    nsqd_test.go:57: ERROR: diskqueue(bench_disk_queue_put100001438932029) failed to sync - open /tmp/nsq-test-1438932029315307917260011854/bench_disk_queue_put100001438932029.diskqueue.meta.dat.2988853604260672186.tmp: no such file or directory
    ... [output truncated]
ok      github.com/bitly/nsq/nsqd   9.669s

while showing an ls of all files in the most recent directory in /tmp via
while true; do ls -alhtr $(ls -tr | tail -n 1); sleep 0.1; done
(files sorted by time)

drwxrwxrwt 17 root   root   420 Aug  7 09:20 ..
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000000.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000001.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000002.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000003.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000004.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000005.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000006.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000007.dat
-rw-------  1 dieter dieter  22 Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.meta.dat
drwx------  2 dieter dieter 240 Aug  7 09:20 .
-rw-------  1 dieter dieter 14M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000008.dat
total 851M
drwxrwxrwt 17 root   root   420 Aug  7 09:20 ..
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000000.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000001.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000002.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000003.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000004.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000005.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000006.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000007.dat
-rw-------  1 dieter dieter  22 Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.meta.dat
drwx------  2 dieter dieter 240 Aug  7 09:20 .
-rw-------  1 dieter dieter 69M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000008.dat
total 906M
drwxrwxrwt 17 root   root   420 Aug  7 09:20 ..
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000000.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000001.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000002.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000003.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000004.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000005.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000006.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000007.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000008.dat
-rw-------  1 dieter dieter  22 Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.meta.dat
drwx------  2 dieter dieter 260 Aug  7 09:20 .
-rw-------  1 dieter dieter 27M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000009.dat
total 961M
drwxrwxrwt 17 root   root   420 Aug  7 09:20 ..
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000000.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000001.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000002.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000003.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000004.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000005.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000006.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000007.dat
-rw-------  1 dieter dieter 98M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000008.dat
-rw-------  1 dieter dieter  22 Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.meta.dat
-rw-------  1 dieter dieter 82M Aug  7 09:20 bench_disk_queue_put10000001438932029.diskqueue.000009.dat
drwx------  2 dieter dieter 260 Aug  7 09:20 .
total 4.0K
-rw-r--r--  1 dieter dieter   0 Aug  7 09:15 server_49052_stdout.log
-rw-r--r--  1 dieter dieter 396 Aug  7 09:15 server_49052_stderr.log
drwxrwxrwx  2 dieter dieter  80 Aug  7 09:16 .
drwxrwxrwt 15 root   root   380 Aug  7 09:20 ..

so the files do seem to exist.
of course the timing is hard to align. perhaps we should add some instrumentation (directory listing etc) to the code

@jehiah
Copy link
Member

jehiah commented Aug 7, 2015

I can reproduce getting errors from this benchmark if i set GOMAXPROCS=2 on Go 1.4.2 @Dieterbe are you by chance running a 1.5 RC?

$ GOMAXPROCS=2 go test -bench=QueuePut
2015/08/07 07:36:56 http: TLS handshake error from 127.0.0.1:52142: tls: client didn't provide a certificate
PASS
BenchmarkDiskQueuePut-2   100000         12483 ns/op      82.03 MB/s
--- BENCH: BenchmarkDiskQueuePut-2
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438947421): writeOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421865214761306943289/bench_disk_queue_put11438947421.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438947421): readOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421865214761306943289/bench_disk_queue_put11438947421.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438947421): writeOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421867235370662363972/bench_disk_queue_put1001438947421.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438947421): readOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421867235370662363972/bench_disk_queue_put1001438947421.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438947421): writeOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421870434077213081555/bench_disk_queue_put100001438947421.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438947421): readOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421870434077213081555/bench_disk_queue_put100001438947421.diskqueue.000000.dat
    nsqd_test.go:57: ERROR: diskqueue(bench_disk_queue_put100001438947421) failed to sync - open /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421870434077213081555/bench_disk_queue_put100001438947421.diskqueue.meta.dat.4165036997436823033.tmp: no such file or directory
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1000001438947421): writeOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421995575201024437526/bench_disk_queue_put1000001438947421.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1000001438947421): readOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421995575201024437526/bench_disk_queue_put1000001438947421.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1000001438947421): writeOne() opened /var/folders/m9/b198ly_n0278jdm7fr8gjr6w0000gn/T/nsq-test-1438947421995575201024437526/bench_disk_queue_put1000001438947421.diskqueue.000001.dat
    ... [output truncated]
ok      github.com/bitly/nsq/nsqd   10.019s

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Aug 7, 2015

dieter@dieter-m6800 nsq go version
go version go1.4.2 linux/amd64
dieter@dieter-m6800 nsq env | grep -i '^go'
GOPATH=/home/dieter/go

@mreiferson
Copy link
Member

I think this could be because we defer os.RemoveAll(...) without cleanly closing the DQ beforehand? It's exacerbated in GOMAXPROCS > 1 situations because it might end up clearing the tmp path the next iteration is using.

I'll patch when I'm in front of a computer.

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Aug 7, 2015

running with GOMAXPROCS=1 go test -bench=QueuePut looks the same though :(

@jehiah
Copy link
Member

jehiah commented Aug 7, 2015

@Dieterbe can you give this branch a try?

@mreiferson
Copy link
Member

running with GOMAXPROCS=1 go test -bench=QueuePut looks the same though :(

exacerbated 😁

@mreiferson
Copy link
Member

@jehiah reeeeeeeeeeeeebase plz

@jehiah jehiah force-pushed the diskqueue_bench_cleanup_619 branch from 10e5a26 to cea5995 Compare August 7, 2015 13:55
@Dieterbe
Copy link
Contributor Author

Dieterbe commented Aug 7, 2015

no errors when i run go test -bench=QueuePut in jehiah/diskqueue_bench_cleanup_619
but i'm also getting suspiciously fast performance

BenchmarkDiskQueuePut262144    10000          1202 ns/op    218040.76 MB/s
--- BENCH: BenchmarkDiskQueuePut262144
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438956572): writeOne() opened /tmp/nsq-test-1438956572698538959368068656/bench_disk_queue_put11438956572.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438956572): closing
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438956572): closing ... ioLoop
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438956572): writeOne() opened /tmp/nsq-test-1438956572699759452797161935/bench_disk_queue_put1001438956572.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438956572): closing
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438956572): closing ... ioLoop
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438956572): writeOne() opened /tmp/nsq-test-1438956572700833763031899874/bench_disk_queue_put100001438956572.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438956572): closing
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438956572): closing ... ioLoop
BenchmarkDiskQueuePut1048576       10000          1403 ns/op    747318.37 MB/s
--- BENCH: BenchmarkDiskQueuePut1048576
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438956572): writeOne() opened /tmp/nsq-test-1438956572714118434679528153/bench_disk_queue_put11438956572.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438956572): closing
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put11438956572): closing ... ioLoop
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438956572): writeOne() opened /tmp/nsq-test-1438956572715354078924973668/bench_disk_queue_put1001438956572.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438956572): closing
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put1001438956572): closing ... ioLoop
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438956572): writeOne() opened /tmp/nsq-test-1438956572716642569946731635/bench_disk_queue_put100001438956572.diskqueue.000000.dat
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438956572): closing
    nsqd_test.go:57: DISKQUEUE(bench_disk_queue_put100001438956572): closing ... ioLoop
ok      github.com/bitly/nsq/nsqd   7.987s

(i'm monitoring /tmp and it didn't fill up, in fact it looks like it's not really touched at all in terms of disk used/avail changing)

@mreiferson
Copy link
Member

that is because it is panicing because we aren't checking errors:

BenchmarkDiskQueuePut1048576    panic: invalid message write size (1048576)

@mreiferson
Copy link
Member

(pushed a fix)

@mreiferson
Copy link
Member

unsurprisingly getting much more realistic numbers 😁

BenchmarkDiskQueuePut1048576        1000       1775909 ns/op     590.44 MB/s

@mreiferson mreiferson force-pushed the diskqueue_bench_cleanup_619 branch from ef3662e to 9f30626 Compare August 7, 2015 16:04
mreiferson added a commit that referenced this pull request Aug 7, 2015
nsqd: BenchmarkDiskQueuePut prints errors
@mreiferson mreiferson merged commit d5fa32b into nsqio:master Aug 7, 2015
@mreiferson mreiferson deleted the diskqueue_bench_cleanup_619 branch August 7, 2015 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants