Performance regression on Windows due to removal of jemalloc #36328

iopq · 2016-09-07T20:46:19Z

https://bitbucket.org/iopq/fizzbuzz-in-rust

Steps to reproduce:

Compile this with
rustup install nightly-2016-01-20
cargo clean
rustup run nightly-2016-01-20 cargo bench

observe that you get nice performance:

running 2 tests
test bench_cowbuzz ... bench: 891 ns/iter (+/- 72)
test bench_fizzbuzz ... bench: 1,125 ns/iter (+/- 40)

Compile this with

rustup install nightly-2016-01-21
cargo clean
rustup run nightly-2016-01-21 cargo bench

observe that you get a performance regression:

test bench_cowbuzz ... bench: 1,071 ns/iter (+/- 42)
test bench_fizzbuzz ... bench: 1,542 ns/iter (+/- 101)

rustc -vV gives

commit-hash: 7dce32e
commit-date: 2016-01-20
host: i686-pc-windows-gnu
release: 1.7.0-nightly

The performance is worse in every version including the current nightly, 2016-06-12 gives

test bench_cowbuzz ... bench: 996 ns/iter (+/- 31)
test bench_fizzbuzz ... bench: 1,420 ns/iter (+/- 50)

The text was updated successfully, but these errors were encountered:

durka · 2016-09-07T21:33:08Z

Seems to be within the error bars on OSX.

iopq · 2016-09-07T23:32:34Z

I found that it's my dependency that has the performance regression:

https://github.com/iopq/monoid

running the bench on 2016-01-20 gives

test bench_cowstring ... bench: 71 ns/iter (+/- 4)
test bench_cowstring_ms2ger ... bench: 66 ns/iter (+/- 5)

running the same thing on 2016-01-21 gives

test bench_cowstring ... bench: 130 ns/iter (+/- 5)
test bench_cowstring_ms2ger ... bench: 122 ns/iter (+/- 4)

the performance hasn't recovered since

all these tests are doing is string concatenation

#[bench]
fn bench_cowstring(b: &mut test::Bencher) {
    b.iter(|| {
        let fizz = Into::<Cow<_>>::into("Fizz").into_owned();
        let buzz = Into::<Cow<_>>::into("Buzz");
        Into::<Cow<_>>::into(fizz + &*buzz)
    });
}

#[bench]
fn bench_cowstring_ms2ger(b: &mut test::Bencher) {
    b.iter( || {  
            let mut fizz = Into::<Cow<_>>::into("Fizz").into_owned();
            let buzz = Into::<Cow<_>>::into("Buzz");
            fizz.push_str(&*buzz);
            Into::<Cow<_>>::into(fizz);
        }
    );
}

TimNN · 2016-09-08T00:17:29Z

The reduced test case does also not reproduce the problem on OS X.

Looking at the changes, this regression may have been caused by disabling jemalloc on windows-gnu in #30985 (edit: although it should have been re-enabled as far as I can tell, so if this was the cause, things should be fine on current nightlies).

ollie27 · 2016-09-09T03:37:22Z

This does look like it's because jemalloc was disabled. On the 2016-09-08 nightly for x86_64-pc-windows-gnu I get:

test bench_cowstring        ... bench:         157 ns/iter (+/- 8)
test bench_cowstring_ms2ger ... bench:         155 ns/iter (+/- 11)

Adding #![feature(alloc_jemalloc)] and extern crate alloc_jemalloc; gives:

test bench_cowstring        ... bench:          85 ns/iter (+/- 3)
test bench_cowstring_ms2ger ... bench:          84 ns/iter (+/- 2)

Compared with the 2016-01-20 nightly:

test bench_cowstring        ... bench:          90 ns/iter (+/- 3)
test bench_cowstring_ms2ger ... bench:          91 ns/iter (+/- 6)

And the 2016-01-21 nightly:

test bench_cowstring        ... bench:         158 ns/iter (+/- 4)
test bench_cowstring_ms2ger ... bench:         156 ns/iter (+/- 2)

brson · 2016-09-13T16:30:07Z

Libs team needs to explain the path forward for re-gaining this perf loss. We're unlikely to add jemalloc back by default here.

brson · 2016-10-04T16:42:11Z

Right now the intended path forward to regain this perf is by stabilizing the global allocator API, and publishing the jemalloc crate to crates.io.

alexcrichton · 2017-06-20T14:48:09Z

We're unlikely to really do anything to fix this, so I'm going to close this in favor of #27389. It's highly likely that all programs will start to link to jemalloc by default once we stabilize that feature.

brson added I-slow Issue: Problems and improvements with respect to performance of generated code. O-windows Operating system: Windows regression-from-stable-to-stable Performance or correctness regression from one stable version to another. labels Sep 7, 2016

brson added A-allocators Area: Custom and system allocators P-low Low priority I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 13, 2016

brson removed the I-nominated label Oct 4, 2016

brson changed the title ~~Performance regression on Windows~~ Performance regression on Windows due to removal of jemalloc Oct 6, 2016

alexcrichton closed this as completed Jun 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression on Windows due to removal of jemalloc #36328

Performance regression on Windows due to removal of jemalloc #36328

iopq commented Sep 7, 2016 •

edited

Loading

durka commented Sep 7, 2016

iopq commented Sep 7, 2016 •

edited

Loading

TimNN commented Sep 8, 2016 •

edited

Loading

ollie27 commented Sep 9, 2016

brson commented Sep 13, 2016

brson commented Oct 4, 2016

alexcrichton commented Jun 20, 2017

Performance regression on Windows due to removal of jemalloc #36328

Performance regression on Windows due to removal of jemalloc #36328

Comments

iopq commented Sep 7, 2016 • edited Loading

durka commented Sep 7, 2016

iopq commented Sep 7, 2016 • edited Loading

TimNN commented Sep 8, 2016 • edited Loading

ollie27 commented Sep 9, 2016

brson commented Sep 13, 2016

brson commented Oct 4, 2016

alexcrichton commented Jun 20, 2017

iopq commented Sep 7, 2016 •

edited

Loading

iopq commented Sep 7, 2016 •

edited

Loading

TimNN commented Sep 8, 2016 •

edited

Loading