Benchmarks are run when just executing tests, leading to very slooooow tests #25293

shepmaster · 2015-05-11T00:48:45Z

In my benchmarks, I generate some non-trivial sized blobs of data. Recently, my regular test runs have been very slow, and I believe it's because the benchmarks are running even when not passing --bench

bench.rs

#![feature(test)]

extern crate test;
use std::iter;

#[bench]
fn slow(b: &mut test::Bencher) {
    // This is dog-slow without optimizations...
    let s: String = iter::repeat("a").take(5*1024*1024).collect();
    println!("I made a string");
    b.iter(|| 1 + 1);
    b.bytes = s.len() as u64;
}

And running it:

$ rustc --test bench.rs
$ ./bench --nocapture

running 1 test
I made a string
test slow ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

Even running with --test still runs these very slow tests.

The text was updated successfully, but these errors were encountered:

shepmaster · 2015-05-11T00:49:02Z

ping @huonw, as we talked a bit about this on IRC.

huonw · 2015-05-11T02:48:17Z

The benchmark should only be actually run once, not multiple times as --bench does. I made the change under the assumption that --bench functions will be relatively quick to run, since they are executed many times. Admittedly, I hadn't considered that tests are run without optimisations by default (and no-opt being very slow seems to be the fundamental issue here).

shepmaster · 2015-05-11T03:40:48Z

bench functions will be relatively quick to run, since they are executed many times

And in my case, I do some potentially expensive setup in each benchmark. I'm careful to do that outside the iter call so it doesn't affect my benched time, but I don't really want to pay that for nothing.

I don't understand the original decision behind the change - none of my benchmarks have any assertions, so running them with the tests seems not useful. What am I missing?

huonw · 2015-05-11T03:53:05Z

NB. even the function marked #[bench] (i.e. not just iter's closure) will be executed several times when running as a benchmark.

I don't understand the original decision behind the change - none of my benchmarks have any assertions, so running them with the tests seems not useful. What am I missing?

They can still trigger assertions if they're incorrect, e.g. indexing out of bounds. The motivation was #15842.

bluss · 2015-05-11T10:41:12Z

Maybe move benchmarks into cargo's benches directory? I assume those won't run during cargo test.

shepmaster · 2015-05-11T13:30:16Z

move benchmarks into cargo's benches directory

Wouldn't that impose the public / private visibility rules? I'd be sad If I couldn't get fine-grained benchmarks...

shepmaster · 2015-05-11T13:37:57Z

For what it's worth, I have a workaround as I group all my benchmarks into a bench module, separate from my test module. I can run cargo test 'test::', which is inelegant but gets the job done for now.

I also don't know that this is a high-priority issue since benchmarking isn't a stable feature. However, I have personal interest in helping fix it. :-)

One idea I have would be to redirect the concept of bytes. Maybe Bencher could have a "suggested" byte size:

#[bench]
fn slow(b: &mut test::Bencher) {
    let s: String = iter::repeat("a").take(b.suggested_len).collect();
    b.iter(|| 1 + 1);
    b.bytes = s.len() as u64;
}

This could be set to a small value in test runs, and provide the side feature of allowing the benchmarking to automatically scale out different test sizes (1, 10, 100, 1000, etc).

Another would be to have a flag denoting if benching is actually happening:

#[bench]
fn slow(b: &mut test::Bencher) {
    let size = if b.is_benching { 5 * 1024 * 1024 } else { 1 };
    let s: String = iter::repeat("a").take(size).collect();
    b.iter(|| 1 + 1);
    b.bytes = s.len() as u64;
}

Both of these feel hacky though...

huonw · 2015-05-16T11:28:36Z

FWIW, I just encountered an instance where a benchmark was triggering integer overflow in my library that was only picked up by cargo test, since cargo bench disables debug-assertions by default. Maybe this indicates my tests aren't good enough, but still, it seems valuable to attack correctness from as many sides as are available.

That said, I think I'd be happy to switch this to opt-in (although I would be opting-in essentially always). Alternatively, we could provide an opt-out flag.

shepmaster · 2015-05-17T15:22:36Z

even the function marked #[bench](i.e. not just iter's closure) will be executed several times when running as a benchmark.

This was not something that I was consciously aware of, although it makes sense with what I have read about how the benchmarking tool works. I guess it's a good thing that the benching functionality is unstable, as there does seem to be some awkward corner cases 😸

a benchmark was triggering integer overflow in my library

It's hard to argue with results, I suppose

it seems valuable to attack correctness from as many sides as are available

I know you don't mean it this way, but there's a potential slippery slope here. I like using focused tools for specific jobs. There are tools like quickcheck or afl.rs that help stretch the input space of our code to find issues we didn't think about unit testing specifically. If it makes sense to use benchmarking tools for testing, does it also make sense to use testing tools for benchmarking?

I'd be happy to switch this to opt-in

I don't know if it's worth changing anything yet, especially based on a single person complaining. I probably wouldn't even be talking about this if I could generate a 5{KB,MB,GB} in a "quick" time.

chuck-park · 2016-09-30T10:31:26Z

Benchmarks still run even when run with --test.

Mark-Simulacrum · 2017-05-14T14:58:21Z

@shepmaster Do you think it'd be worth adding a --no-bench flag? That would effectively allow disable benchmarks when you want to.

shepmaster · 2017-05-15T02:27:48Z

adding a --no-bench flag?

It would certainly provide a more reusable knob to turn than my current "nest all benches in a module" approach, but I'd worry that there's still the surprise inherent with "Oh, #[bench] also means #[test], just it doesn't iterate a bunch".

If the attribute were something like #[test(bench)], that might make it more obvious, helping show the need for such a flag.

Mark-Simulacrum · 2017-05-15T03:44:08Z

We could certainly deprecate #[bench] and change it to #[test(bench)] though I worry that the latter still isn't quite clear enough... it would improve the situation. However, I'm also somewhat concerned about just how much annoyance deprecating #[bench] would cause, since I'd guess a large portion of the community uses it.

akshaynanavati · 2017-06-03T00:06:05Z

To temporarily resolve this issue, I have done the following:

In my cargo.toml added a benchmarks feature:

...
[features]
benchmarks = []
...

Then, I add the following to my benchmarks module:

#[cfg(all(feature = "benchmarks", test))]
mod benchmarks {
    ...
}

When I just run cargo tests, the benchmarks aren't even compiled since the feature is not set. I can run cargo tests --features benchmarks or cargo bench --features benchmarks if I want the benchmarks compiled/run.

buckle2000 · 2018-05-11T10:18:29Z

is there something like #[bench(no-test)] ?

steveklabnik added I-slow Issue: Problems and improvements with respect to performance of generated code. A-benchmarks labels May 13, 2015

This was referenced Sep 27, 2016

이슈 선정 kosslab-kr/rust#5

Open

#25293 오래된 이슈 재현 실행 에러 kosslab-kr/rust#7

Closed

Mark-Simulacrum added A-libtest Area: `#[test]` / the `test` library and removed A-benchmarks labels Jun 24, 2017

Mark-Simulacrum added C-feature-request Category: A feature request, i.e: not implemented / a PR. C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 22, 2017

Enselic added the requires-nightly This issue requires a nightly compiler in some way. label Oct 6, 2023

github-project-automation bot added this to testing-devex backlog Oct 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks are run when just executing tests, leading to very slooooow tests #25293

Benchmarks are run when just executing tests, leading to very slooooow tests #25293

shepmaster commented May 11, 2015

shepmaster commented May 11, 2015

huonw commented May 11, 2015

shepmaster commented May 11, 2015

huonw commented May 11, 2015

bluss commented May 11, 2015

shepmaster commented May 11, 2015

shepmaster commented May 11, 2015

huonw commented May 16, 2015

shepmaster commented May 17, 2015

chuck-park commented Sep 30, 2016

Mark-Simulacrum commented May 14, 2017

shepmaster commented May 15, 2017

Mark-Simulacrum commented May 15, 2017

akshaynanavati commented Jun 3, 2017

buckle2000 commented May 11, 2018

Benchmarks are run when just executing tests, leading to very slooooow tests #25293

Benchmarks are run when just executing tests, leading to very slooooow tests #25293

Comments

shepmaster commented May 11, 2015

shepmaster commented May 11, 2015

huonw commented May 11, 2015

shepmaster commented May 11, 2015

huonw commented May 11, 2015

bluss commented May 11, 2015

shepmaster commented May 11, 2015

shepmaster commented May 11, 2015

huonw commented May 16, 2015

shepmaster commented May 17, 2015

chuck-park commented Sep 30, 2016

Mark-Simulacrum commented May 14, 2017

shepmaster commented May 15, 2017

Mark-Simulacrum commented May 15, 2017

akshaynanavati commented Jun 3, 2017

buckle2000 commented May 11, 2018