Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop allocating on every byte read (for CIDs and varints) #220

Merged
merged 2 commits into from
Sep 7, 2021
Merged

stop allocating on every byte read (for CIDs and varints) #220

merged 2 commits into from
Sep 7, 2021

Conversation

mvdan
Copy link
Contributor

@mvdan mvdan commented Sep 7, 2021

(see commit messages - please do not squash)

Using ReadBlocks as an example, we have 30% fewer allocs and 7% less cpu usage.

ipfs/go-cid#132 does another 11% and 3%, respectively.

We need a ByteReader for some APIs, such as reading CIDs.
However, our entrypoints often don't, such as Reader or ReaderAt.
Thus, we have a type that does the wrapping to support ReadByte.

We were converting a non-pointer to an interface,
which forcibly allocates, since interfaces must contain pointers.

Fix that by making the ReadByte methods use pointer receivers.

    name           old time/op    new time/op    delta
    ReadBlocks-16    1.18ms ± 2%    1.15ms ± 5%   -2.73%  (p=0.003 n=11+11)

    name           old speed      new speed      delta
    ReadBlocks-16   441MB/s ± 2%   453MB/s ± 5%   +2.85%  (p=0.003 n=11+11)

    name           old alloc/op   new alloc/op   delta
    ReadBlocks-16    1.33MB ± 0%    1.29MB ± 0%   -2.41%  (p=0.000 n=12+12)

    name           old allocs/op  new allocs/op  delta
    ReadBlocks-16     13.5k ± 0%     11.5k ± 0%  -14.79%  (p=0.000 n=12+12)
Like the last commit, avoid an extra allocation per read byte.
In this case, we created a one-byte buffer for each readByte call.
Unfortunately, since io.Reader is an interface,
the compiler can't know if it holds onto the memory,
so the buffer escapes and cannot be placed in the stack.

To sidestep this issue, reuse a preallocated buffer.
We know this is fine, because we only do sequential reads.

    name           old time/op    new time/op    delta
    ReadBlocks-16    1.15ms ± 5%    1.09ms ± 4%   -5.13%  (p=0.000 n=11+11)

    name           old speed      new speed      delta
    ReadBlocks-16   453MB/s ± 5%   478MB/s ± 4%   +5.41%  (p=0.000 n=11+11)

    name           old alloc/op   new alloc/op   delta
    ReadBlocks-16    1.29MB ± 0%    1.30MB ± 0%   +0.48%  (p=0.000 n=12+12)

    name           old allocs/op  new allocs/op  delta
    ReadBlocks-16     11.5k ± 0%      9.5k ± 0%  -17.35%  (p=0.000 n=12+12)
Copy link
Member

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you 🍻

@masih masih merged commit ccffb5c into ipld:master Sep 7, 2021
@mvdan mvdan deleted the perf-low-hanging branch September 14, 2021 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants