Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packet encoding optimization #343

Merged
merged 13 commits into from
Dec 21, 2023
Merged

Packet encoding optimization #343

merged 13 commits into from
Dec 21, 2023

Conversation

thedevop
Copy link
Collaborator

As described in #342.

@coveralls
Copy link

coveralls commented Dec 13, 2023

Pull Request Test Coverage Report for Build 7294150121

  • 77 of 80 (96.25%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.04%) to 98.785%

Changes Missing Coverage Covered Lines Changed/Added Lines %
mempool/bufpool.go 29 32 90.63%
Totals Coverage Status
Change from base Build 7294141625: -0.04%
Covered Lines: 5527
Relevant Lines: 5595

💛 - Coveralls

@mochi-co
Copy link
Collaborator

mochi-co commented Dec 13, 2023

I love the idea -how do we feel about the risk of active items in the pool being dropped under high load situations? Is this risk present? Per https://pkg.go.dev/sync#Pool

Any item stored in the Pool may be removed automatically at any time without notification. If the Pool holds the only reference when this happens, the item might be deallocated.

Presumably as long as a reference is held, it wont be deallocated?

@thedevop
Copy link
Collaborator Author

thedevop commented Dec 13, 2023

No active items will be in the pool. In this implementation, a buffer is 1st taken out of the pool (in this state, it has no association to the pool and it's no different than a newly allocated item), if none is available, a new buffer is generated. It will only be put back into the pool when it's no longer needed (and only if it's no longer in use).

Any items in the pool can be GCed any time, which will have no negative impact.

@werbenhu
Copy link
Member

werbenhu commented Dec 14, 2023

I tried to perform some simple stress testing on a standalone machine, but perhaps my stress test had too few clients and messages. For example, I used 30 clients, with each client sending 2000 messages. Based on the memory and CPU usage on my end, I didn't see a significant difference.

@thedevop @mochi-co @dgduncan @x20080406 If any of you have the capability to test performance differences with a large number of clients and messages, please give it a try. Alternatively, we can directly write some packet encoding benchmarks to see the results.

Additionally, @thedevop if it's confirmed that a buffer pool is needed, I think we can make it simpler. There's no need for BufferWithCap, for example, just like this:

package nsqd

import (
	"bytes"
	"sync"
)

var bp sync.Pool

func init() {
	bp.New = func() interface{} {
		return &bytes.Buffer{}
	}
}

func bufferPoolGet() *bytes.Buffer {
	return bp.Get().(*bytes.Buffer)
}

func bufferPoolPut(b *bytes.Buffer) {
	b.Reset()
	bp.Put(b)
}

@thedevop
Copy link
Collaborator Author

thedevop commented Dec 14, 2023

@werbenhu, I have checked in benchmark, the results are:

cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkBufferPool
BenchmarkBufferPool-8                	52733648	        20.26 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferPoolWithCapLarger
BenchmarkBufferPoolWithCapLarger-8   	56555937	        21.71 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferPoolWithCapLesser
BenchmarkBufferPoolWithCapLesser-8   	 6784573	       182.1 ns/op	     112 B/op	       2 allocs/op
BenchmarkBufferWithoutPool
BenchmarkBufferWithoutPool-8         	26074543	        49.38 ns/op	      64 B/op	       1 allocs/op

Actual performance should be even better, as the bytes.Buffer may grow few times to encode the packet headers.

I used to use the simplified method as you have provided, which works perfectly fine if all the items are similar in size. It doesn't work as well if items are of different size. See https://golang.org/issue/23199. Although most of the packets without payload should be similar in size, but with the inclusion of metadata (user properties), it may become quite large.

I intentionally for this version avoid the final buffer that includes the payload, we should use a dedicated buffer pool for that, and perhaps let it be user configurable based on their use case. So the mempool is to future proof for that use case.

In our perf test environment, where we're publishing ~130K/s (bottlenecked by CPU), preliminary result shows ~3% improvement.

@thedevop
Copy link
Collaborator Author

I benchmarked pk.Properties.Encode using the following:

func BenchmarkEncodeNoPool(b *testing.B) {
	props := propertiesStruct

	for i := 0; i < b.N; i++ {
		buf := bytes.NewBuffer([]byte{})
		props.Encode(Reserved, Mods{AllowResponseInfo: true}, buf, 0)
		_ = buf
	}
}

func BenchmarkEncodePoolBuf(b *testing.B) {
	props := propertiesStruct

	for i := 0; i < b.N; i++ {
		buf := mempool.GetBuffer()
		props.Encode(Reserved, Mods{AllowResponseInfo: true}, buf, 0)
		mempool.PutBuffer(buf)
	}
}

The results are:

cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkEncodeNoPool-8    	  664742	      1716 ns/op	     736 B/op	       9 allocs/op
BenchmarkEncodePoolBuf-8   	  748135	      1574 ns/op	     504 B/op	       6 allocs/op

If I also replace the buf within Encode to use buffer pool, the results are:

cpu: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
BenchmarkEncodeNoPool-8    	  730120	      1552 ns/op	     288 B/op	       6 allocs/op
BenchmarkEncodePoolBuf-8   	  873458	      1352 ns/op	      56 B/op	       3 allocs/op

@werbenhu
Copy link
Member

werbenhu commented Dec 16, 2023

@thedevop perhaps we can learn from encoding/json: fix usage of sync.Pool, where we can see that the standard library's json.Marshal also faces a similar issue to what we are currently encountering. However, at least for now, it appears that the json.Marshal still directly employs the sync.Pool without a cap. Personally, I suggest we can temporarily postpone the use of BufferWithCap. @mochi-co what are your thoughts on this?

@thedevop
Copy link
Collaborator Author

thedevop commented Dec 16, 2023

@werbenhu, I'm ok we use the pool without cap, as most of the packet encoding without payload should not vary too much on size. I updated the default pool to not have a cap.

@mochi-co
Copy link
Collaborator

Sorry my life has been crazy this year. Just catching up on all this good work you've done.

The pool benchmarks are promising, and at scale this will make a significant difference. @thedevop your point about differing payload sizes is poignant and well considered.

I agree with @werbenhu that we should first attempt to use the pool without a cap to reduce the complexity we're adding here. However, we should monitor the change and add the cap if it does become necessary.

As for me, I think this is a great contribution, and it's definitely getting merged.

@mochi-co mochi-co merged commit c6c7c29 into mochi-mqtt:main Dec 21, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants