Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

Fix Read() behaviour; heavy performance optimisations; i/o deadlines; more #58

Merged
merged 16 commits into from
Apr 24, 2020

Conversation

raulk
Copy link
Member

@raulk raulk commented Mar 4, 2020

Improvements

  • make Read() conform to io.Reader behaviour: Read() used to behave like io.ReadFull(); it now behaves like io.Reader mandates.

  • Heavy performance optimisation (see benchmarks and benchcmp below):

    • Preallocate 2-byte (long) slices for message length calculation.
    • on read, copy directly to supplied buffer, if the message is smaller or equal to len(buffer) (zero-alloc path).
    • use buffer pools (via go-buffer-pool) to contain allocs and GC.
  • Fix handshake i/o operations not setting deadlines.

  • Remove PDFs from repo.

  • Wrap errors.

  • Refine and improve comments.

Benchmarks (master vs. this branch)

⟩ git checkout 606df358faeaca7509f6556ae5917d8b8ba1f4d1
Note: checking out '606df358faeaca7509f6556ae5917d8b8ba1f4d1'.
HEAD is now at 606df35 bench: reset timer + report allocs.
⟩ go test -count=1 -run=NONE -benchtime=5s -bench  . | tee old.txt
goos: darwin
goarch: amd64
pkg: github.com/libp2p/go-libp2p-noise
BenchmarkTransfer1MB-8          2088       2620798 ns/op     400559109 bytes/sec     4762127 B/op        281 allocs/op
BenchmarkTransfer100MB-8          25     228606052 ns/op     458687650 bytes/sec    472017481 B/op     25746 allocs/op
BenchmarkTransfer500Mb-8           5    1137357804 ns/op     460971180 bytes/sec    2359911987 B/op   128630 allocs/op
BenchmarkHandshakeXX-8          6932        853004 ns/op       23063 B/op        304 allocs/op
PASS
ok      github.com/libp2p/go-libp2p-noise   31.346s

⟩ git checkout raul-review
Previous HEAD position was 606df35 bench: reset timer + report allocs.
Switched to branch 'raul-review'
⟩ go test -count=1 -run=NONE -benchtime=5s -bench  . | tee new.txt
goos: darwin
goarch: amd64
pkg: github.com/libp2p/go-libp2p-noise
BenchmarkTransfer1MB-8          2918       2055000 ns/op     510708724 bytes/sec       36068 B/op         83 allocs/op
BenchmarkTransfer100MB-8          30     199768509 ns/op     524905157 bytes/sec      137920 B/op       6421 allocs/op
BenchmarkTransfer500Mb-8           6     998917048 ns/op     524858261 bytes/sec      567322 B/op      32028 allocs/op
BenchmarkHandshakeXX-8          6806        852974 ns/op       23334 B/op        302 allocs/op
PASS
ok      github.com/libp2p/go-libp2p-noise   37.214s

⟩ benchcmp old.txt new.txt
benchmark                    old ns/op      new ns/op     delta
BenchmarkTransfer1MB-8       2620798        2055000       -21.59%
BenchmarkTransfer100MB-8     228606052      199768509     -12.61%
BenchmarkTransfer500Mb-8     1137357804     998917048     -12.17%
BenchmarkHandshakeXX-8       853004         852974        -0.00%

benchmark                    old allocs     new allocs     delta
BenchmarkTransfer1MB-8       281            83             -70.46%
BenchmarkTransfer100MB-8     25746          6421           -75.06%
BenchmarkTransfer500Mb-8     128630         32028          -75.10%
BenchmarkHandshakeXX-8       304            302            -0.66%

benchmark                    old bytes      new bytes     delta
BenchmarkTransfer1MB-8       4762127        36068         -99.24%
BenchmarkTransfer100MB-8     472017481      137920        -99.97%
BenchmarkTransfer500Mb-8     2359911987     567322        -99.98%
BenchmarkHandshakeXX-8       23063          23334         +1.18%

Fixes #57.
Fixes #75.
Fixes #76.

 - Read() used to behave like io.ReadFull(); it now behaves like
   io.Reader mandates.
 - preallocate 2-byte (long) slices for message length calculation.
 - on read, copy directly to supplied buffer, if the message is
   smaller or equal to len(buffer).
 - use buffer pools (via go-buffer-pool) to contain allocs and GC.
@raulk raulk changed the title [Holding pen] go-libp2p-noise review Fix Read() behaviour; heavy performance optimisations; i/o deadlines; more Apr 23, 2020
@raulk raulk marked this pull request as ready for review April 23, 2020 17:16
Copy link
Contributor

@yusefnapora yusefnapora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such an improvement, thanks @raulk. I especially ❤️the updated comments.

@raulk
Copy link
Member Author

raulk commented Apr 23, 2020

@yusefnapora 🙏

Let's give @Stebalien the chance to review, and @aarshkshah1992 too if he feels like it!

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Main points:

  • Avoid writing the length in a separate write. This has caused us a lot of trouble in the past.
  • If possible, decrypt in-place. Many libraries support this.

crypto.go Show resolved Hide resolved
crypto.go Show resolved Hide resolved
handshake.go Outdated
// set a deadline to complete the handshake, if one has been supplied.
// clear it after we're done.
if deadline, ok := ctx.Deadline(); ok {
_ = s.SetDeadline(deadline)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the error. If deadlines aren't supported, we shouldn't try to revert the deadline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note: It may be better to just spin off a goroutine that closes the connection when the context closes (setting a deadline on the context itself). Otherwise, we're not going to obey the context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note: It may be better to just spin off a goroutine that closes the connection when the context closes (setting a deadline on the context itself). Otherwise, we're not going to obey the context.

Could you elaborate? Do you mean that if the context fires while we're not waiting on i/o, we wouldn't notice, and strictly speaking wouldn't yield until the next i/o operation? I think that's a trade-off I want to take, vs. introducing the extra complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 711cc40; I added a TODO for potentially spinning off a goroutine if we can't set a native deadline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deadline will help if the context has a deadline. However, the user may just want to cancel the context. In that case, we can't set a deadline and the only solution is to wait on the context:

var conn, handshakeErr, doneCh
go func() {
    defer close(doneCh)

    // do handshake
}()

select {
case <-ctx.Done():
    insecureConn.Close()
    <-doneCh
    return nil, ctx.Err()
case <-doneCh:
    return conn, handshakeErr
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can do this in a new patch.

rw.go Outdated Show resolved Hide resolved
rw.go Outdated Show resolved Hide resolved
rw.go Outdated Show resolved Hide resolved
rw.go Show resolved Hide resolved
rw.go Outdated Show resolved Hide resolved
@aarshkshah1992
Copy link
Contributor

aarshkshah1992 commented Apr 24, 2020

@raulk I do feel like it but need the time to grok and digest all the magic happening here. Please feel free to merge without waiting on my review if we are in a hurry to land this. In that case, I'll do a review after the merge and will create PRs myself to fix stuff that needs to be fixed.

@raulk raulk requested a review from Stebalien April 24, 2020 11:42
@raulk
Copy link
Member Author

raulk commented Apr 24, 2020

@Stebalien all done here. Final benchmarks vs. benchmarks before your review:

⟩ go test -count=1 -run=NONE -benchtime=5s -bench  . | tee new2.txt
goos: darwin
goarch: amd64
pkg: github.com/libp2p/go-libp2p-noise
BenchmarkTransfer1MB-8     	    2832	   2105028 ns/op	 498591263 bytes/sec	   35826 B/op	      83 allocs/op
BenchmarkTransfer100MB-8   	      27	 205335938 ns/op	 510672492 bytes/sec	  140965 B/op	    6421 allocs/op
BenchmarkTransfer500Mb-8   	       5	1033737823 ns/op	 507178476 bytes/sec	  550246 B/op	   32023 allocs/op
BenchmarkHandshakeXX-8     	    7046	    855459 ns/op	   23289 B/op	     300 allocs/op
PASS
ok  	github.com/libp2p/go-libp2p-noise	31.242s

⟩ benchcmp new.txt new2.txt
benchmark                    old ns/op     new ns/op      delta
BenchmarkTransfer1MB-8       2055000       2105028        +2.43%
BenchmarkTransfer100MB-8     199768509     205335938      +2.79%
BenchmarkTransfer500Mb-8     998917048     1033737823     +3.49%
BenchmarkHandshakeXX-8       852974        855459         +0.29%

benchmark                    old allocs     new allocs     delta
BenchmarkTransfer1MB-8       83             83             +0.00%
BenchmarkTransfer100MB-8     6421           6421           +0.00%
BenchmarkTransfer500Mb-8     32028          32023          -0.02%
BenchmarkHandshakeXX-8       302            300            -0.66%

benchmark                    old bytes     new bytes     delta
BenchmarkTransfer1MB-8       36068         35826         -0.67%
BenchmarkTransfer100MB-8     137920        140965        +2.21%
BenchmarkTransfer500Mb-8     567322        550246        -3.01%
BenchmarkHandshakeXX-8       23334         23289         -0.19%

Not a lot has changed (maybe the compiler was already optimising for us?). Throughput is down a little (maybe an artefact). bytes up and down -- not a lot of difference. Effects may be more noticeable in real-world scenarios? Or have I done something wrong?

@Stebalien
Copy link
Member

Note: if the receive buffer is large enough, we can avoid allocating entirely on ready by:

  1. Reading into the buffer passed in read.
  2. Decrypting in-place.

But it may be best to punt that to a new PR.

rw.go Outdated Show resolved Hide resolved
session.go Outdated Show resolved Hide resolved
@raulk
Copy link
Member Author

raulk commented Apr 24, 2020

Merging this. There are definitely a few more optimisations that we can pursue, but they'll require further refactoring, which I don't have time to do now.

Final benchmarks -- looks like my previous run had some artefacts, this is much better -- compared to master:

⟩ go test -count=1 -run=NONE -benchtime=5s -bench  . | tee new3.txt
goos: darwin
goarch: amd64
pkg: github.com/libp2p/go-libp2p-noise
BenchmarkTransfer1MB-8     	    2952	   2015012 ns/op	 520935205 bytes/sec	   35994 B/op	      83 allocs/op
BenchmarkTransfer100MB-8   	      30	 196408553 ns/op	 533885318 bytes/sec	  144692 B/op	    6421 allocs/op
BenchmarkTransfer500Mb-8   	       6	 986187290 ns/op	 531633178 bytes/sec	  563954 B/op	   32027 allocs/op
BenchmarkHandshakeXX-8     	    6944	    864960 ns/op	   23238 B/op	     298 allocs/op
PASS
ok  	github.com/libp2p/go-libp2p-noise	28.178s
⟩ benchcmp old.txt new3.txt
benchmark                    old ns/op      new ns/op     delta
BenchmarkTransfer1MB-8       2620798        2015012       -23.11%
BenchmarkTransfer100MB-8     228606052      196408553     -14.08%
BenchmarkTransfer500Mb-8     1137357804     986187290     -13.29%
BenchmarkHandshakeXX-8       853004         864960        +1.40%

benchmark                    old allocs     new allocs     delta
BenchmarkTransfer1MB-8       281            83             -70.46%
BenchmarkTransfer100MB-8     25746          6421           -75.06%
BenchmarkTransfer500Mb-8     128630         32027          -75.10%
BenchmarkHandshakeXX-8       304            298            -1.97%

benchmark                    old bytes      new bytes     delta
BenchmarkTransfer1MB-8       4762127        35994         -99.24%
BenchmarkTransfer100MB-8     472017481      144692        -99.97%
BenchmarkTransfer500Mb-8     2359911987     563954        -99.98%
BenchmarkHandshakeXX-8       23063          23238         +0.76%

@raulk raulk merged commit 69090b2 into master Apr 24, 2020
@raulk raulk deleted the raul-review branch April 24, 2020 19:54
@raulk
Copy link
Member Author

raulk commented Apr 24, 2020

@Stebalien follow-ups captured here: #77.

s.writeLock.Lock()
defer s.writeLock.Unlock()

writeChunk := func(in []byte) (int, error) {
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk Why do we need to take this write lock given that there's no shared secureSession state across these Write calls and golang's net.Conn allows concurrent writes ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we encrypt then write. If we don't take a lock, two threads A, B could encrypt in order A, B, then end up writing B, A on the wire, which would make the stream ciphers fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Write writes the entire incoming data, which could take several rounds if the data exceeds the maximum payload size. If two threads are writing at the same time, their chunk writes could intertwine.

size := int(binary.BigEndian.Uint16(buf))
buf = make([]byte, size)
size := int(binary.BigEndian.Uint16(s.rlen[:]))
buf := pool.Get(size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk For handshake messages, we never put this back in the pool. Wont that cause a leak ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return to the pool, but sync.Pool doesn't retain references to the elements it hands out (unlike pools in other languages), so this won't cause a leak as it will be GC'ed. But it's less than ideal, yeah.

Copy link
Contributor

@aarshkshah1992 aarshkshah1992 Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulk

Hmmm.. given that we can heavily re-use these pooled buffers for handshake messages (because messages in the same handshake stage are always of the same length), not returning these to the pool causes unnecessary GC/allocs. Will fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please fix. It doesn't cause a leak, but it's suboptimal.

@libp2p libp2p deleted a comment from raulk Apr 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants