Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove hot calls to memmove by replacing Vec<u8> with VecDeque<u8> #17

Merged
merged 7 commits into from
Apr 5, 2022
Merged

Remove hot calls to memmove by replacing Vec<u8> with VecDeque<u8> #17

merged 7 commits into from
Apr 5, 2022

Conversation

paolobarbolini
Copy link
Contributor

@paolobarbolini paolobarbolini commented Apr 3, 2022

This removes the need to constantly drain, which is one of the hottest functions on the flamegraph.

From a quick benchmark on a large file decoding goes from about 10.5s to 3.5s. As a comparison zstd 1.4.8 takes 1.2s. We're getting closer 👀

Before

flamegraph

After

flamegraph

@KillingSpark
Copy link
Owner

Thats great! I'll take a look later today and merge this

I wonder how I missed that. I seem to remember that the code taking the most time was the bitreader stuff

@paolobarbolini
Copy link
Contributor Author

paolobarbolini commented Apr 3, 2022

I wonder how I missed that. I seem to remember that the code taking the most time was the bitreader stuff

I guess that might have been caused by me trying ruzstd on a 600 MB file compressed with zstd --ultra -22 😅

@KillingSpark
Copy link
Owner

The TODO around the extend_from_within needs to be resolved. This is a massive performance boost for files that have long regions that can be copied from the existing buffer. It should be possible to do, it will likely require some unsafe though. I had an unsafe implementation for this before Vec::extend_from_within existed, and I am willing to have some unsafe in there for this big boost in performance.

If you don't want to do that I can merge this and do it myself when I find the time. The code is still correct, it is just being pretty inefficient in these cases.

Otherwise just a few nitpicks,

  1. drain_to should probably call f before the writes to the hash. Because the bytes only get drained when the calls succeed. This might result in the data being put into the hash multiple times. There might be legitimate reasons to call this again even if f fails for some reason. I apparently failed to see this when I wrote this code, but I think it would be better this way. Same applies to all places where data is drained. It should first be given to the fallible functions and only if they succeed the data should be given to the hasher.

  2. drain_to should probably not call f if n2 is 0. I know it is just a slice with no elements but I think it would "nice" to not do that. And the check is easy enough to do

So (imo) it should be

   f(slice1)?;
   hash(slice1);
   if !slice2.is_empty() {
       f(slice2)?;
       hash(slice2);
   }
  1. drain_to_writer should probably be implemented using drain_to too, like drain_to_window_size_writer right? Maybe I am just overlooking something here?

@paolobarbolini
Copy link
Contributor Author

paolobarbolini commented Apr 3, 2022

Done. Unfortunately VecDeque doesn't provide the unsafe APIs required to reproduce the equivalent of extend_from_within so we have to go though an intermediate buffer, which brings a small performance boost anyway for my big file so I guess it's fine?

The size for buf is a random number off the top of my head, so we can change it to a better one.

@KillingSpark
Copy link
Owner

Ah right, I remembered why VecDequeue cant implement this nicely. I might play around with a custom buffer structure for this. We don't need all the features a VecDequeue provides. But that is for another day, I'd call this good enough. I too am not sure what a good size for that buffer is, but 4kb seems fine.

One last thing just came to my mind, regarding the functions in drain_to. Right now it is like this:

   f(slice1)?;
   hash(slice1);
   if !slice2.is_empty() {
       f(slice2)?;
       hash(slice2);
   }
   drain(amount);

When we commit slice1 to the hasher, we should also drain that. So when f(slice2) returns an error, we want to still call drain(0..n1). So something like this maybe?

    f(slice1)?;
    hash(slice1);

    if n2 != 0 {
        let res = f(slice2);
        let drain_amount;
        if res.is_ok() {
            hash(slice2);
            drain_amount = n1+n2;
        } else {
            drain_amount = n1;
        }
        drain(drain_amount)
        res
    } else {
        drain(n1);
        Ok(())
    }

Sorry for doing another roundtrip but I think it's good to fix this in this PR too. The drain_to is a very good idea, and having it do the right thing when it's merged in would be nice :)

@paolobarbolini
Copy link
Contributor Author

paolobarbolini commented Apr 4, 2022

I'll take a look at your proposal later in the day.

This PR just went by in my notifications and it could be very cool to leverage this implementation in the future if it gets optimized more. rust-lang/rust#95632

@paolobarbolini
Copy link
Contributor Author

paolobarbolini commented Apr 4, 2022

Done. Using DrainGuard also makes this panic resistant

@KillingSpark
Copy link
Owner

Very nice. Thank you!

@KillingSpark KillingSpark merged commit 7779f8b into KillingSpark:master Apr 5, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 28, 2022
…=the8472

Add VecDeque::extend from vec::IntoIter and slice::Iter specializations

Inspired from the [`Vec` `SpecExtend` implementation](https://github.com/rust-lang/rust/blob/027a232755fa9728e9699337267f6675dfd0a8ba/library/alloc/src/vec/spec_extend.rs), but without the specialization for `TrustedLen` which I'll look into in the future.

Should help rust-lang#95632 and KillingSpark/zstd-rs#17

## Benchmarks

Before

```
test vec_deque::bench_extend_bytes    ... bench:         862 ns/iter (+/- 10)
test vec_deque::bench_extend_vec      ... bench:         883 ns/iter (+/- 19)
```

After

```
test vec_deque::bench_extend_bytes    ... bench:           8 ns/iter (+/- 0)
test vec_deque::bench_extend_vec      ... bench:          24 ns/iter (+/- 1)

```
workingjubilee pushed a commit to tcdi/postgrestd that referenced this pull request Sep 15, 2022
Add VecDeque::extend from vec::IntoIter and slice::Iter specializations

Inspired from the [`Vec` `SpecExtend` implementation](https://github.com/rust-lang/rust/blob/027a232755fa9728e9699337267f6675dfd0a8ba/library/alloc/src/vec/spec_extend.rs), but without the specialization for `TrustedLen` which I'll look into in the future.

Should help #95632 and KillingSpark/zstd-rs#17

## Benchmarks

Before

```
test vec_deque::bench_extend_bytes    ... bench:         862 ns/iter (+/- 10)
test vec_deque::bench_extend_vec      ... bench:         883 ns/iter (+/- 19)
```

After

```
test vec_deque::bench_extend_bytes    ... bench:           8 ns/iter (+/- 0)
test vec_deque::bench_extend_vec      ... bench:          24 ns/iter (+/- 1)

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants