Fix Read() behaviour; heavy performance optimisations; i/o deadlines; more #58

raulk · 2020-03-04T13:31:53Z

Improvements

make Read() conform to io.Reader behaviour: Read() used to behave like io.ReadFull(); it now behaves like io.Reader mandates.
Heavy performance optimisation (see benchmarks and benchcmp below):
- Preallocate 2-byte (long) slices for message length calculation.
- on read, copy directly to supplied buffer, if the message is smaller or equal to len(buffer) (zero-alloc path).
- use buffer pools (via go-buffer-pool) to contain allocs and GC.
Fix handshake i/o operations not setting deadlines.
Remove PDFs from repo.
Wrap errors.
Refine and improve comments.

Benchmarks (master vs. this branch)

⟩ git checkout 606df358faeaca7509f6556ae5917d8b8ba1f4d1
Note: checking out '606df358faeaca7509f6556ae5917d8b8ba1f4d1'.
HEAD is now at 606df35 bench: reset timer + report allocs.
⟩ go test -count=1 -run=NONE -benchtime=5s -bench  . | tee old.txt
goos: darwin
goarch: amd64
pkg: github.com/libp2p/go-libp2p-noise
BenchmarkTransfer1MB-8          2088       2620798 ns/op     400559109 bytes/sec     4762127 B/op        281 allocs/op
BenchmarkTransfer100MB-8          25     228606052 ns/op     458687650 bytes/sec    472017481 B/op     25746 allocs/op
BenchmarkTransfer500Mb-8           5    1137357804 ns/op     460971180 bytes/sec    2359911987 B/op   128630 allocs/op
BenchmarkHandshakeXX-8          6932        853004 ns/op       23063 B/op        304 allocs/op
PASS
ok      github.com/libp2p/go-libp2p-noise   31.346s

⟩ git checkout raul-review
Previous HEAD position was 606df35 bench: reset timer + report allocs.
Switched to branch 'raul-review'
⟩ go test -count=1 -run=NONE -benchtime=5s -bench  . | tee new.txt
goos: darwin
goarch: amd64
pkg: github.com/libp2p/go-libp2p-noise
BenchmarkTransfer1MB-8          2918       2055000 ns/op     510708724 bytes/sec       36068 B/op         83 allocs/op
BenchmarkTransfer100MB-8          30     199768509 ns/op     524905157 bytes/sec      137920 B/op       6421 allocs/op
BenchmarkTransfer500Mb-8           6     998917048 ns/op     524858261 bytes/sec      567322 B/op      32028 allocs/op
BenchmarkHandshakeXX-8          6806        852974 ns/op       23334 B/op        302 allocs/op
PASS
ok      github.com/libp2p/go-libp2p-noise   37.214s

⟩ benchcmp old.txt new.txt
benchmark                    old ns/op      new ns/op     delta
BenchmarkTransfer1MB-8       2620798        2055000       -21.59%
BenchmarkTransfer100MB-8     228606052      199768509     -12.61%
BenchmarkTransfer500Mb-8     1137357804     998917048     -12.17%
BenchmarkHandshakeXX-8       853004         852974        -0.00%

benchmark                    old allocs     new allocs     delta
BenchmarkTransfer1MB-8       281            83             -70.46%
BenchmarkTransfer100MB-8     25746          6421           -75.06%
BenchmarkTransfer500Mb-8     128630         32028          -75.10%
BenchmarkHandshakeXX-8       304            302            -0.66%

benchmark                    old bytes      new bytes     delta
BenchmarkTransfer1MB-8       4762127        36068         -99.24%
BenchmarkTransfer100MB-8     472017481      137920        -99.97%
BenchmarkTransfer500Mb-8     2359911987     567322        -99.98%
BenchmarkHandshakeXX-8       23063          23334         +1.18%

Fixes #57.
Fixes #75.
Fixes #76.

Fixes #75.

- Read() used to behave like io.ReadFull(); it now behaves like io.Reader mandates. - preallocate 2-byte (long) slices for message length calculation. - on read, copy directly to supplied buffer, if the message is smaller or equal to len(buffer). - use buffer pools (via go-buffer-pool) to contain allocs and GC.

yusefnapora

This is such an improvement, thanks @raulk. I especially ❤️the updated comments.

raulk · 2020-04-23T18:58:09Z

@yusefnapora 🙏

Let's give @Stebalien the chance to review, and @aarshkshah1992 too if he feels like it!

Stebalien

Nice!

Main points:

Avoid writing the length in a separate write. This has caused us a lot of trouble in the past.
If possible, decrypt in-place. Many libraries support this.

crypto.go

Stebalien · 2020-04-23T19:14:30Z

handshake.go

+	// set a deadline to complete the handshake, if one has been supplied.
+	// clear it after we're done.
+	if deadline, ok := ctx.Deadline(); ok {
+		_ = s.SetDeadline(deadline)


Check the error. If deadlines aren't supported, we shouldn't try to revert the deadline.

Also note: It may be better to just spin off a goroutine that closes the connection when the context closes (setting a deadline on the context itself). Otherwise, we're not going to obey the context.

Also note: It may be better to just spin off a goroutine that closes the connection when the context closes (setting a deadline on the context itself). Otherwise, we're not going to obey the context.

Could you elaborate? Do you mean that if the context fires while we're not waiting on i/o, we wouldn't notice, and strictly speaking wouldn't yield until the next i/o operation? I think that's a trade-off I want to take, vs. introducing the extra complexity.

Fixed in 711cc40; I added a TODO for potentially spinning off a goroutine if we can't set a native deadline.

The deadline will help if the context has a deadline. However, the user may just want to cancel the context. In that case, we can't set a deadline and the only solution is to wait on the context:

var conn, handshakeErr, doneCh go func() { defer close(doneCh) // do handshake }() select { case <-ctx.Done(): insecureConn.Close() <-doneCh return nil, ctx.Err() case <-doneCh: return conn, handshakeErr }

But we can do this in a new patch.

rw.go

aarshkshah1992 · 2020-04-24T11:36:37Z

@raulk I do feel like it but need the time to grok and digest all the magic happening here. Please feel free to merge without waiting on my review if we are in a hurry to land this. In that case, I'll do a review after the merge and will create PRs myself to fix stuff that needs to be fixed.

raulk · 2020-04-24T11:49:47Z

@Stebalien all done here. Final benchmarks vs. benchmarks before your review:

⟩ go test -count=1 -run=NONE -benchtime=5s -bench  . | tee new2.txt
goos: darwin
goarch: amd64
pkg: github.com/libp2p/go-libp2p-noise
BenchmarkTransfer1MB-8     	    2832	   2105028 ns/op	 498591263 bytes/sec	   35826 B/op	      83 allocs/op
BenchmarkTransfer100MB-8   	      27	 205335938 ns/op	 510672492 bytes/sec	  140965 B/op	    6421 allocs/op
BenchmarkTransfer500Mb-8   	       5	1033737823 ns/op	 507178476 bytes/sec	  550246 B/op	   32023 allocs/op
BenchmarkHandshakeXX-8     	    7046	    855459 ns/op	   23289 B/op	     300 allocs/op
PASS
ok  	github.com/libp2p/go-libp2p-noise	31.242s

⟩ benchcmp new.txt new2.txt
benchmark                    old ns/op     new ns/op      delta
BenchmarkTransfer1MB-8       2055000       2105028        +2.43%
BenchmarkTransfer100MB-8     199768509     205335938      +2.79%
BenchmarkTransfer500Mb-8     998917048     1033737823     +3.49%
BenchmarkHandshakeXX-8       852974        855459         +0.29%

benchmark                    old allocs     new allocs     delta
BenchmarkTransfer1MB-8       83             83             +0.00%
BenchmarkTransfer100MB-8     6421           6421           +0.00%
BenchmarkTransfer500Mb-8     32028          32023          -0.02%
BenchmarkHandshakeXX-8       302            300            -0.66%

benchmark                    old bytes     new bytes     delta
BenchmarkTransfer1MB-8       36068         35826         -0.67%
BenchmarkTransfer100MB-8     137920        140965        +2.21%
BenchmarkTransfer500Mb-8     567322        550246        -3.01%
BenchmarkHandshakeXX-8       23334         23289         -0.19%

Not a lot has changed (maybe the compiler was already optimising for us?). Throughput is down a little (maybe an artefact). bytes up and down -- not a lot of difference. Effects may be more noticeable in real-world scenarios? Or have I done something wrong?

Stebalien · 2020-04-24T15:56:24Z

Note: if the receive buffer is large enough, we can avoid allocating entirely on ready by:

Reading into the buffer passed in read.
Decrypting in-place.

But it may be best to punt that to a new PR.

rw.go

session.go

raulk · 2020-04-24T19:53:38Z

Merging this. There are definitely a few more optimisations that we can pursue, but they'll require further refactoring, which I don't have time to do now.

Final benchmarks -- looks like my previous run had some artefacts, this is much better -- compared to master:

⟩ go test -count=1 -run=NONE -benchtime=5s -bench  . | tee new3.txt
goos: darwin
goarch: amd64
pkg: github.com/libp2p/go-libp2p-noise
BenchmarkTransfer1MB-8     	    2952	   2015012 ns/op	 520935205 bytes/sec	   35994 B/op	      83 allocs/op
BenchmarkTransfer100MB-8   	      30	 196408553 ns/op	 533885318 bytes/sec	  144692 B/op	    6421 allocs/op
BenchmarkTransfer500Mb-8   	       6	 986187290 ns/op	 531633178 bytes/sec	  563954 B/op	   32027 allocs/op
BenchmarkHandshakeXX-8     	    6944	    864960 ns/op	   23238 B/op	     298 allocs/op
PASS
ok  	github.com/libp2p/go-libp2p-noise	28.178s
⟩ benchcmp old.txt new3.txt
benchmark                    old ns/op      new ns/op     delta
BenchmarkTransfer1MB-8       2620798        2015012       -23.11%
BenchmarkTransfer100MB-8     228606052      196408553     -14.08%
BenchmarkTransfer500Mb-8     1137357804     986187290     -13.29%
BenchmarkHandshakeXX-8       853004         864960        +1.40%

benchmark                    old allocs     new allocs     delta
BenchmarkTransfer1MB-8       281            83             -70.46%
BenchmarkTransfer100MB-8     25746          6421           -75.06%
BenchmarkTransfer500Mb-8     128630         32027          -75.10%
BenchmarkHandshakeXX-8       304            298            -1.97%

benchmark                    old bytes      new bytes     delta
BenchmarkTransfer1MB-8       4762127        35994         -99.24%
BenchmarkTransfer100MB-8     472017481      144692        -99.97%
BenchmarkTransfer500Mb-8     2359911987     563954        -99.98%
BenchmarkHandshakeXX-8       23063          23238         +0.76%

raulk · 2020-04-24T19:59:14Z

@Stebalien follow-ups captured here: #77.

aarshkshah1992 · 2020-04-27T09:15:46Z

rw.go

 	s.writeLock.Lock()
 	defer s.writeLock.Unlock()

-	writeChunk := func(in []byte) (int, error) {


@raulk Why do we need to take this write lock given that there's no shared secureSession state across these Write calls and golang's net.Conn allows concurrent writes ?

Because we encrypt then write. If we don't take a lock, two threads A, B could encrypt in order A, B, then end up writing B, A on the wire, which would make the stream ciphers fail.

Also, Write writes the entire incoming data, which could take several rounds if the data exceeds the maximum payload size. If two threads are writing at the same time, their chunk writes could intertwine.

aarshkshah1992 · 2020-04-27T11:52:45Z

rw.go

-	size := int(binary.BigEndian.Uint16(buf))
-	buf = make([]byte, size)
+	size := int(binary.BigEndian.Uint16(s.rlen[:]))
+	buf := pool.Get(size)


@raulk For handshake messages, we never put this back in the pool. Wont that cause a leak ?

This should return to the pool, but sync.Pool doesn't retain references to the elements it hands out (unlike pools in other languages), so this won't cause a leak as it will be GC'ed. But it's less than ideal, yeah.

@raulk

Hmmm.. given that we can heavily re-use these pooled buffers for handshake messages (because messages in the same handshake stage are always of the same length), not returning these to the pool causes unnecessary GC/allocs. Will fix this.

Yes, please fix. It doesn't cause a leak, but it's suboptimal.

raulk added 6 commits April 23, 2020 10:47

readme: fix travis badge.

23db4a3

remove ETHBerlin pdf.

80f83e0

update go.mod; tidy.

bbc48d5

wrap errors.

13fc08a

refine comments.

18a7a4b

propagate context deadlines to handshake.

74045bc

Fixes #75.

raulk force-pushed the raul-review branch from f1cfeb8 to 74045bc Compare April 23, 2020 09:47

raulk added 3 commits April 23, 2020 13:29

bench: reset timer + report allocs.

606df35

improve comments and readability.

c140c2d

raulk requested review from yusefnapora, aarshkshah1992 and Stebalien April 23, 2020 17:15

raulk changed the title ~~[Holding pen] go-libp2p-noise review~~ Fix Read() behaviour; heavy performance optimisations; i/o deadlines; more Apr 23, 2020

raulk marked this pull request as ready for review April 23, 2020 17:16

yusefnapora approved these changes Apr 23, 2020

View reviewed changes

Stebalien reviewed Apr 24, 2020

View reviewed changes

raulk added 3 commits April 24, 2020 11:08

stage writes in a buffer; write at once on the transport.

7574dfa

stop tracking qrem in state; simplify q logic.

8b88758

revert deadline iff it could actually be set.

711cc40

raulk added 2 commits April 24, 2020 12:38

decrypt in place.

1c06c73

rename param.

3379ca8

raulk requested a review from Stebalien April 24, 2020 11:42

Stebalien approved these changes Apr 24, 2020

View reviewed changes

rw.go Outdated Show resolved Hide resolved

session.go Outdated Show resolved Hide resolved

raulk added 2 commits April 24, 2020 20:46

minor nits.

758eb92

Merge branch 'master' into raul-review

1ad9313

raulk merged commit 69090b2 into master Apr 24, 2020

raulk deleted the raul-review branch April 24, 2020 19:54

raulk mentioned this pull request Apr 24, 2020

Further optimisations and correctness fixes #77

Closed

aarshkshah1992 reviewed Apr 27, 2020

View reviewed changes

libp2p deleted a comment from raulk Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Read() behaviour; heavy performance optimisations; i/o deadlines; more #58

Fix Read() behaviour; heavy performance optimisations; i/o deadlines; more #58

raulk commented Mar 4, 2020 •

edited

Loading

yusefnapora left a comment

raulk commented Apr 23, 2020 •

edited

Loading

Stebalien left a comment

Stebalien Apr 23, 2020

Stebalien Apr 23, 2020

raulk Apr 24, 2020

raulk Apr 24, 2020

Stebalien Apr 24, 2020

Stebalien Apr 24, 2020

aarshkshah1992 commented Apr 24, 2020 •

edited

Loading

raulk commented Apr 24, 2020

Stebalien commented Apr 24, 2020

raulk commented Apr 24, 2020

raulk commented Apr 24, 2020

aarshkshah1992 Apr 27, 2020 •

edited

Loading

raulk Apr 27, 2020

raulk Apr 27, 2020

aarshkshah1992 Apr 27, 2020

raulk Apr 27, 2020

aarshkshah1992 Apr 27, 2020 •

edited

Loading

raulk Apr 27, 2020

Fix Read() behaviour; heavy performance optimisations; i/o deadlines; more #58

Fix Read() behaviour; heavy performance optimisations; i/o deadlines; more #58

Conversation

raulk commented Mar 4, 2020 • edited Loading

Improvements

Benchmarks (master vs. this branch)

yusefnapora left a comment

Choose a reason for hiding this comment

raulk commented Apr 23, 2020 • edited Loading

Stebalien left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarshkshah1992 commented Apr 24, 2020 • edited Loading

raulk commented Apr 24, 2020

Stebalien commented Apr 24, 2020

raulk commented Apr 24, 2020

raulk commented Apr 24, 2020

aarshkshah1992 Apr 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarshkshah1992 Apr 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk commented Mar 4, 2020 •

edited

Loading

raulk commented Apr 23, 2020 •

edited

Loading

aarshkshah1992 commented Apr 24, 2020 •

edited

Loading

aarshkshah1992 Apr 27, 2020 •

edited

Loading

aarshkshah1992 Apr 27, 2020 •

edited

Loading