Performance regression in 1.64+ when `BufReader` inner reader doesn't fill the buffer #102727

bgilbert · 2022-10-06T05:17:40Z

Code

I tried this code:

use std::cmp::min;
use std::io;

const TOTAL_BYTES: usize = 16 * 1024 * 1024;
// runtime increases as the difference between these values increases
const BYTES_PER_READ: usize = 8 * 1024;
const BUFREADER_CAPACITY: usize = 256 * 1024;

fn main() {
    let mut reader = io::BufReader::with_capacity(BUFREADER_CAPACITY, Reader(TOTAL_BYTES));
    let count = io::copy(&mut reader, &mut io::sink()).unwrap();
    println!("copied {count} bytes");
}

struct Reader(usize);

impl io::Read for Reader {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        let len = min(min(buf.len(), self.0), BYTES_PER_READ);
        if len > 0 {
            buf[..len].copy_from_slice(&vec![0; len]);
            self.0 -= len;
        }
        Ok(len)
    }
}

I expected to see this happen:

$ rustc -o slowio slowio.rs
$ time ./slowio
copied 16777216 bytes

real	0m0.015s
user	0m0.011s
sys	0m0.003s

Instead, this happened:

$ rustc -o slowio slowio.rs
$ time ./slowio 
copied 16777216 bytes

real	0m8.909s
user	0m8.891s
sys	0m0.002s

The problem doesn't occur with optimization enabled.

Version it worked on

It most recently worked on:

rustc 1.63.0 (4b91a6ea7 2022-08-08)
binary: rustc
commit-hash: 4b91a6ea7258a947e59c6522cd5898e7c0a6a88f
commit-date: 2022-08-08
host: x86_64-unknown-linux-gnu
release: 1.63.0
LLVM version: 14.0.5

Version with regression

rustc 1.64.0 (Fedora 1.64.0-1.fc36)
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.64.0
LLVM version: 14.0.5

Also fails on:

rustc 1.66.0-nightly (01af5040f 2022-10-04)
binary: rustc
commit-hash: 01af5040fdada6ef8f1b749cda798d80a8590b2c
commit-date: 2022-10-04
host: x86_64-unknown-linux-gnu
release: 1.66.0-nightly
LLVM version: 15.0.2

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged

The text was updated successfully, but these errors were encountered:

saethlin · 2022-10-06T05:53:30Z

Thanks for reporting this! This is an issue with a debug build so it's unlikely to be prioritized high but I'm interested in helping out because this is very goofy.

The slow code here is the Iterator impl on slices. I think in the last stable release we switched the implementation of BufReader to eventually use ReadBuf, and the slow code is ReadBuf initializing the remaining uninitialized capacity. It does this by iterating over a slice, and that iteration is the slow bit.

What I don't get here is why we need to initialize at all. That doesn't seem right.

apiraino · 2022-10-06T10:18:15Z

WG-prioritization assigning highest priority since this looks a pretty visible runtime perf. regression (Zulip discussion). If the assessment should be revised, please feel free to comment.

@rustbot label -I-prioritize +P-critical i-slow

apiraino · 2022-10-06T13:45:19Z

I've tried bisecting with cargo bisect and feeding this command: rustc main.rs && timeout 2 ./main.

Regression seems to be in 50166d5 and confirm #98748

bisected with cargo-bisect-rustc v0.6.0

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc ./script.sh --start 2022-07-01 --preserve

saethlin · 2022-10-06T14:20:22Z

😂 I was so sure this wasn't my fault. More fool me for not bisecting it myself.

pnkfelix · 2022-10-06T14:48:07Z

(This probably shows an opportunity to add a benchmark to the newly-established "runtime benchmarks" (rather than the compilation-time benchmarks that perf.rlo has classically focused on...)

saethlin · 2022-10-06T14:54:35Z

After sleeping on this, I realized that there's no real reason to expect the worlds-worst-memset code which is the source of the slowdown to be totally optimized out with --release. And of course it isn't. This is also a substantial regression with optimizations. Here's a version of the original program that's touched up a bit:

use std::cmp::min;
use std::io;

const TOTAL_BYTES: usize = 16 * 1024 * 1024;
// runtime increases as the difference between these values increases
const BYTES_PER_READ: usize = 8 * 1024;
const BUFREADER_CAPACITY: usize = 256 * 1024;

fn main() {
    for _ in 0..1_000 {
        let mut reader = io::BufReader::with_capacity(BUFREADER_CAPACITY, Reader(TOTAL_BYTES));
        let count = io::copy(&mut reader, &mut io::sink()).unwrap();
        assert_eq!(count, 16777216)
    }
}

struct Reader(usize);

impl io::Read for Reader {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        let len = min(min(buf.len(), self.0), BYTES_PER_READ);
        if len > 0 {
            buf[..len].copy_from_slice(&vec![0; len]);
            self.0 -= len;
        }
        Ok(len)
    }
}

On 1.63 I see runtime ~0.43 s, and on 1.64 ~4.8 s. So we have a slowdown of around 10x, though I suspect that number could be ratcheted up higher.

saethlin · 2022-10-06T23:26:14Z

@rustbot claim

bgilbert · 2022-10-09T15:57:44Z

Awesome, thanks for the really quick fix!

Use memset to initialize readbuf The write loop was found to be slow in rust-lang#102727 The proper fix is in rust-lang#102760 but this might still help debug builds and code running under miri by using the write_bytes intrinsic instead of writing one byte at a time.

Avoid repeated re-initialization of the BufReader buffer Fixes rust-lang/rust#102727 We accidentally removed this in rust-lang/rust#98748. It looks so redundant. But it isn't. The default `Read::read_buf` will defensively initialize the whole buffer, if any of it is indicated to be uninitialized. In uses where reads from the wrapped `Read` impl completely fill the `BufReader`, `initialized` and `filled` are the same, and this extra member isn't required. But in the reported issue, the `BufReader` wraps a `Read` impl which will _never_ fill the whole buffer. So the default `Read::read_buf` implementation repeatedly re-initializes the extra space in the buffer. This adds back the extra `initialized` member, which ensures that the default `Read::read_buf` only zero-initialized the buffer once, and I've tried to add a comment which explains this whole situation.

bgilbert added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Oct 6, 2022

bgilbert mentioned this issue Oct 6, 2022

osmet: avoid install performance regression in debug mode on Rust 1.64+ coreos/coreos-installer#1012

Merged

rustbot added I-slow Issue: Problems and improvements with respect to performance of generated code. P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 6, 2022

apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 6, 2022

rustbot assigned saethlin Oct 6, 2022

This was referenced Oct 7, 2022

Avoid repeated re-initialization of the BufReader buffer #102760

Merged

Add a BufReader benchmark for reads smaller than the buffer rust-lang/rustc-perf#1460

Merged

bors closed this as completed in fe42003 Oct 7, 2022

the8472 mentioned this issue Oct 8, 2022

Use memset to initialize readbuf #102811

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in 1.64+ when `BufReader` inner reader doesn't fill the buffer #102727

Performance regression in 1.64+ when `BufReader` inner reader doesn't fill the buffer #102727

bgilbert commented Oct 6, 2022 •

edited by rustbot

Loading

saethlin commented Oct 6, 2022

apiraino commented Oct 6, 2022

apiraino commented Oct 6, 2022

saethlin commented Oct 6, 2022

pnkfelix commented Oct 6, 2022

saethlin commented Oct 6, 2022 •

edited

Loading

saethlin commented Oct 6, 2022

bgilbert commented Oct 9, 2022

Performance regression in 1.64+ when BufReader inner reader doesn't fill the buffer #102727

Performance regression in 1.64+ when BufReader inner reader doesn't fill the buffer #102727

Comments

bgilbert commented Oct 6, 2022 • edited by rustbot Loading

Code

Version it worked on

Version with regression

saethlin commented Oct 6, 2022

apiraino commented Oct 6, 2022

apiraino commented Oct 6, 2022

saethlin commented Oct 6, 2022

pnkfelix commented Oct 6, 2022

saethlin commented Oct 6, 2022 • edited Loading

saethlin commented Oct 6, 2022

bgilbert commented Oct 9, 2022

Performance regression in 1.64+ when `BufReader` inner reader doesn't fill the buffer #102727

Performance regression in 1.64+ when `BufReader` inner reader doesn't fill the buffer #102727

bgilbert commented Oct 6, 2022 •

edited by rustbot

Loading

saethlin commented Oct 6, 2022 •

edited

Loading