Packet encoding optimization #343

thedevop · 2023-12-13T02:37:45Z

As described in #342.

coveralls · 2023-12-13T04:03:27Z

Pull Request Test Coverage Report for Build 7294150121

77 of 80 (96.25%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.04%) to 98.785%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
mempool/bufpool.go	29	32	90.63%

Totals
Change from base Build 7294141625:	-0.04%
Covered Lines:	5527
Relevant Lines:	5595

💛 - Coveralls

mochi-co · 2023-12-13T19:15:38Z

I love the idea -how do we feel about the risk of active items in the pool being dropped under high load situations? Is this risk present? Per https://pkg.go.dev/sync#Pool

Any item stored in the Pool may be removed automatically at any time without notification. If the Pool holds the only reference when this happens, the item might be deallocated.

Presumably as long as a reference is held, it wont be deallocated?

thedevop · 2023-12-13T23:07:05Z

No active items will be in the pool. In this implementation, a buffer is 1st taken out of the pool (in this state, it has no association to the pool and it's no different than a newly allocated item), if none is available, a new buffer is generated. It will only be put back into the pool when it's no longer needed (and only if it's no longer in use).

Any items in the pool can be GCed any time, which will have no negative impact.

werbenhu · 2023-12-14T09:56:40Z

I tried to perform some simple stress testing on a standalone machine, but perhaps my stress test had too few clients and messages. For example, I used 30 clients, with each client sending 2000 messages. Based on the memory and CPU usage on my end, I didn't see a significant difference.

@thedevop @mochi-co @dgduncan @x20080406 If any of you have the capability to test performance differences with a large number of clients and messages, please give it a try. Alternatively, we can directly write some packet encoding benchmarks to see the results.

Additionally, @thedevop if it's confirmed that a buffer pool is needed, I think we can make it simpler. There's no need for BufferWithCap, for example, just like this:

package nsqd

import (
	"bytes"
	"sync"
)

var bp sync.Pool

func init() {
	bp.New = func() interface{} {
		return &bytes.Buffer{}
	}
}

func bufferPoolGet() *bytes.Buffer {
	return bp.Get().(*bytes.Buffer)
}

func bufferPoolPut(b *bytes.Buffer) {
	b.Reset()
	bp.Put(b)
}

thedevop · 2023-12-14T19:59:44Z

@werbenhu, I have checked in benchmark, the results are:

cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkBufferPool
BenchmarkBufferPool-8                	52733648	        20.26 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferPoolWithCapLarger
BenchmarkBufferPoolWithCapLarger-8   	56555937	        21.71 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferPoolWithCapLesser
BenchmarkBufferPoolWithCapLesser-8   	 6784573	       182.1 ns/op	     112 B/op	       2 allocs/op
BenchmarkBufferWithoutPool
BenchmarkBufferWithoutPool-8         	26074543	        49.38 ns/op	      64 B/op	       1 allocs/op

Actual performance should be even better, as the bytes.Buffer may grow few times to encode the packet headers.

I used to use the simplified method as you have provided, which works perfectly fine if all the items are similar in size. It doesn't work as well if items are of different size. See https://golang.org/issue/23199. Although most of the packets without payload should be similar in size, but with the inclusion of metadata (user properties), it may become quite large.

I intentionally for this version avoid the final buffer that includes the payload, we should use a dedicated buffer pool for that, and perhaps let it be user configurable based on their use case. So the mempool is to future proof for that use case.

In our perf test environment, where we're publishing ~130K/s (bottlenecked by CPU), preliminary result shows ~3% improvement.

thedevop · 2023-12-15T07:40:05Z

I benchmarked pk.Properties.Encode using the following:

func BenchmarkEncodeNoPool(b *testing.B) {
	props := propertiesStruct

	for i := 0; i < b.N; i++ {
		buf := bytes.NewBuffer([]byte{})
		props.Encode(Reserved, Mods{AllowResponseInfo: true}, buf, 0)
		_ = buf
	}
}

func BenchmarkEncodePoolBuf(b *testing.B) {
	props := propertiesStruct

	for i := 0; i < b.N; i++ {
		buf := mempool.GetBuffer()
		props.Encode(Reserved, Mods{AllowResponseInfo: true}, buf, 0)
		mempool.PutBuffer(buf)
	}
}

The results are:

cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkEncodeNoPool-8    	  664742	      1716 ns/op	     736 B/op	       9 allocs/op
BenchmarkEncodePoolBuf-8   	  748135	      1574 ns/op	     504 B/op	       6 allocs/op

If I also replace the buf within Encode to use buffer pool, the results are:

cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkEncodeNoPool-8    	  730120	      1552 ns/op	     288 B/op	       6 allocs/op
BenchmarkEncodePoolBuf-8   	  873458	      1352 ns/op	      56 B/op	       3 allocs/op

werbenhu · 2023-12-16T18:00:16Z

@thedevop perhaps we can learn from encoding/json: fix usage of sync.Pool, where we can see that the standard library's json.Marshal also faces a similar issue to what we are currently encountering. However, at least for now, it appears that the json.Marshal still directly employs the sync.Pool without a cap. Personally, I suggest we can temporarily postpone the use of BufferWithCap. @mochi-co what are your thoughts on this?

thedevop · 2023-12-16T19:19:59Z

@werbenhu, I'm ok we use the pool without cap, as most of the packet encoding without payload should not vary too much on size. I updated the default pool to not have a cap.

mochi-co · 2023-12-21T23:23:12Z

Sorry my life has been crazy this year. Just catching up on all this good work you've done.

The pool benchmarks are promising, and at scale this will make a significant difference. @thedevop your point about differing payload sizes is poignant and well considered.

I agree with @werbenhu that we should first attempt to use the pool without a cap to reduce the complexity we're adding here. However, we should monitor the change and add the cap if it does become necessary.

As for me, I think this is a great contribution, and it's definitely getting merged.

thedevop and others added 9 commits October 27, 2023 14:52

Dynamically allocate buffer for writes if needed

a3ebbf5

Remove unused net.Buffer

357a1e6

Return bytes written to buffer instead of conn

e178015

Dynamic write buffer

99f131e

Merge branch 'main' into master

2493215

Merge branch 'mochi-mqtt:main' into master

f268071

Reduce double write of pk.Payload

459df78

Use memory pool for packet encode

906feb7

Pool doesn't guarantee value between Put and Get

ea7f093

thedevop requested review from mochi-co, werbenhu and x20080406 December 13, 2023 05:38

Add benchmark for bufpool

dabeecd

Fix issue mochi-mqtt#346

02bbeb1

Change default pool not to have size cap

c658137

werbenhu approved these changes Dec 17, 2023

View reviewed changes

thedevop mentioned this pull request Dec 18, 2023

Fix data race issue with write buffer #347

Merged

Merge branch 'main' into master

097a721

mochi-co approved these changes Dec 21, 2023

View reviewed changes

mochi-co merged commit c6c7c29 into mochi-mqtt:main Dec 21, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packet encoding optimization #343

Packet encoding optimization #343

thedevop commented Dec 13, 2023

coveralls commented Dec 13, 2023 •

edited

Loading

mochi-co commented Dec 13, 2023 •

edited

Loading

thedevop commented Dec 13, 2023 •

edited

Loading

werbenhu commented Dec 14, 2023 •

edited

Loading

thedevop commented Dec 14, 2023 •

edited

Loading

thedevop commented Dec 15, 2023

werbenhu commented Dec 16, 2023 •

edited

Loading

thedevop commented Dec 16, 2023 •

edited

Loading

mochi-co commented Dec 21, 2023

Packet encoding optimization #343

Packet encoding optimization #343

Conversation

thedevop commented Dec 13, 2023

coveralls commented Dec 13, 2023 • edited Loading

Pull Request Test Coverage Report for Build 7294150121

💛 - Coveralls

mochi-co commented Dec 13, 2023 • edited Loading

thedevop commented Dec 13, 2023 • edited Loading

werbenhu commented Dec 14, 2023 • edited Loading

thedevop commented Dec 14, 2023 • edited Loading

thedevop commented Dec 15, 2023

werbenhu commented Dec 16, 2023 • edited Loading

thedevop commented Dec 16, 2023 • edited Loading

mochi-co commented Dec 21, 2023

coveralls commented Dec 13, 2023 •

edited

Loading

mochi-co commented Dec 13, 2023 •

edited

Loading

thedevop commented Dec 13, 2023 •

edited

Loading

werbenhu commented Dec 14, 2023 •

edited

Loading

thedevop commented Dec 14, 2023 •

edited

Loading

werbenhu commented Dec 16, 2023 •

edited

Loading

thedevop commented Dec 16, 2023 •

edited

Loading