Use `NonZeroU64` to optimize `encoded_len_varint` #1192

mzabaluev · 2024-11-23T20:50:39Z

Give the compiler all the leverage to optimize encoded_len_varint:

Construct a NonZeroU64 to count leading zeros, as that can be faster on many platforms;
Use ilog2 instead of a handwritten expression to compute the base 2 algorithm, as the core library developers and the compiler would probably be in the best position to fine-tune it for all supported platforms.

With the varint benchmarks, I see slight improvements (1-3%) or no reproducible performance changes on 11th gen Intel Core and Mac M3.

The leading zeros count may perform better on many architectures when the zero case is excluded. Also use ilog2 as shorthand for the leading zeros trick because it makes more clearly what we mean to get, and should be ideally optimized by the compiler.

caspermeijn

Thank you for your contribution

caspermeijn approved these changes Nov 25, 2024

View reviewed changes

caspermeijn added this pull request to the merge queue Nov 25, 2024

Merged via the queue into tokio-rs:master with commit 5ae30c5 Nov 25, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `NonZeroU64` to optimize `encoded_len_varint` #1192

Use `NonZeroU64` to optimize `encoded_len_varint` #1192

mzabaluev commented Nov 23, 2024

caspermeijn left a comment

Use NonZeroU64 to optimize encoded_len_varint #1192

Use NonZeroU64 to optimize encoded_len_varint #1192

Conversation

mzabaluev commented Nov 23, 2024

caspermeijn left a comment

Choose a reason for hiding this comment

Use `NonZeroU64` to optimize `encoded_len_varint` #1192

Use `NonZeroU64` to optimize `encoded_len_varint` #1192