-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wavefront/serializer: improve performance by ~30% #5842
wavefront/serializer: improve performance by ~30% #5842
Conversation
This PR improves the performance and memory consumption of the Wavefront serializer by roughly 30%. ``` benchmark old ns/op new ns/op delta BenchmarkSerialize-16 1168 778 -33.39% benchmark old allocs new allocs delta BenchmarkSerialize-16 16 8 -50.00% benchmark old bytes new bytes delta BenchmarkSerialize-16 1116 800 -28.32% ```
Thanks for approving! |
@@ -39,6 +41,12 @@ var tagValueReplacer = strings.NewReplacer("\"", "\\\"", "*", "-") | |||
|
|||
var pathReplacer = strings.NewReplacer("_", ".") | |||
|
|||
var bufPool = sync.Pool{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a reused buffer to the serializer, it isn't required for a Serializer to be thread safe.
This is a notable difference from the Parser interface which is currently required to be thread-safe, though I do think this should be changed in the future. I will update the documentation for these interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think there is value to using a sync.Pool
that is global to the package. The sync.Pool
will pin the current goroutine to a pool, which should improve locality of reference. Additionally, it does not make the WavefrontSerializer
unsafe to use concurrently and is dynamic in size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't any reason for it to be dynamic though because it will never need more than a single buffer. Another issue with the pool is that you can't return the buffer to it unless you copy the data out of the buffer, I believe the current use in this pull request is unsafe even in a single threaded environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue with the pool is that you can't return the buffer to it unless you copy the data out of the buffer
We do that. The contents of pooled buffers are copied to the out
buffer (note: out
will never alias buf
, example). This code is thread-safe.
Personally, I would prefer to not add a mutable field to the WavefrontSerializer
, but I understand the argument for simplicity and will make the change.
What are your thoughts on adding a mutex to protect the WavefrontSerializer
's scratch buffer? From you comment and a quick scan of the code it appears there is no need for it to be thread-safe, but if that were to change it would be beneficial to have it as a guard against this code being overlooked during a refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this, but this pool was added by accident the one we actually use is pbFree
. That said, I'm still removing the pools in favor of a shared buffer - as requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add anything to make it thread safe. I agree it there is a risk of code sneaking in that use these wrong, especially with the way the SerializerOutput interface works (sending in a single instance), but I'd like to move the other way with parsers/serializers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I made WavefrontSerializer
thread-safe.
metricSeparator := "." | ||
const metricSeparator = "." | ||
var out []byte | ||
buf := pbFree.Get().(*buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense to just create a single bytes.Buffer and then pass this into the various functions, the out
slice would be removed and then there is no reason to append. I don't see a reason we need both of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expect something from me tomorrow or Friday as I'm pretty short on bandwidth right now.
@puckpuck Can you review? I assume any changes would also need to go into the wavefront-sdk. |
Note this removes use of `sync.Pool` and since this is no longer thread-safe the parallel benchmark is also removed.
} | ||
|
||
metricValue, buildError := buildValue(value, metric.Metric) | ||
metricValue, buildError := buildValue(value, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always ignore this error - is there any point returning it from buildValue()
? This was the case before this PR was made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be used to determine if the value should be skipped or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically you log on error from the caller, but we log inside the buildValue function instead to avoid noisy log outputs when the value is of type string (which is expected to not work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the insight, I just changed this to return a bool
instead of an error
since we used the error
as a bool (nil / not-nil) and this saves an alloc.
*b = append(*b, c) | ||
} | ||
|
||
func (b *buffer) WriteUnit64(val uint64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function name has a typo in it, which can be misleading (Unit vs Uint)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂ thanks - fixed
return *b | ||
} | ||
|
||
// Use a fast and simple buffer for constructing statsd messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this comment is a left over from something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinda, and good catch - will remove.
Since the buffer we use is now a field of the WavefrontSerializer we need to protect it against concurrent access with a mutex.
The error it previously returned was treated as a bool - so we should just return a bool instead.
This PR improves the performance and memory consumption of the Wavefront serializer by roughly 30%.
Required for all PRs: