Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks are run when just executing tests, leading to very slooooow tests #25293

Open
shepmaster opened this issue May 11, 2015 · 15 comments
Open
Labels
A-libtest Area: `#[test]` / the `test` library C-enhancement Category: An issue proposing an enhancement or a PR with one. C-feature-request Category: A feature request, i.e: not implemented / a PR. I-slow Issue: Problems and improvements with respect to performance of generated code. requires-nightly This issue requires a nightly compiler in some way.

Comments

@shepmaster
Copy link
Member

In my benchmarks, I generate some non-trivial sized blobs of data. Recently, my regular test runs have been very slow, and I believe it's because the benchmarks are running even when not passing --bench

bench.rs

#![feature(test)]

extern crate test;
use std::iter;

#[bench]
fn slow(b: &mut test::Bencher) {
    // This is dog-slow without optimizations...
    let s: String = iter::repeat("a").take(5*1024*1024).collect();
    println!("I made a string");
    b.iter(|| 1 + 1);
    b.bytes = s.len() as u64;
}

And running it:

$ rustc --test bench.rs
$ ./bench --nocapture

running 1 test
I made a string
test slow ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

Even running with --test still runs these very slow tests.

@shepmaster
Copy link
Member Author

ping @huonw, as we talked a bit about this on IRC.

@huonw
Copy link
Member

huonw commented May 11, 2015

The benchmark should only be actually run once, not multiple times as --bench does. I made the change under the assumption that --bench functions will be relatively quick to run, since they are executed many times. Admittedly, I hadn't considered that tests are run without optimisations by default (and no-opt being very slow seems to be the fundamental issue here).

@shepmaster
Copy link
Member Author

bench functions will be relatively quick to run, since they are executed many times

And in my case, I do some potentially expensive setup in each benchmark. I'm careful to do that outside the iter call so it doesn't affect my benched time, but I don't really want to pay that for nothing.

I don't understand the original decision behind the change - none of my benchmarks have any assertions, so running them with the tests seems not useful. What am I missing?

@huonw
Copy link
Member

huonw commented May 11, 2015

NB. even the function marked #[bench] (i.e. not just iter's closure) will be executed several times when running as a benchmark.

I don't understand the original decision behind the change - none of my benchmarks have any assertions, so running them with the tests seems not useful. What am I missing?

They can still trigger assertions if they're incorrect, e.g. indexing out of bounds. The motivation was #15842.

@bluss
Copy link
Member

bluss commented May 11, 2015

Maybe move benchmarks into cargo's benches directory? I assume those won't run during cargo test.

@shepmaster
Copy link
Member Author

move benchmarks into cargo's benches directory

Wouldn't that impose the public / private visibility rules? I'd be sad If I couldn't get fine-grained benchmarks...

@shepmaster
Copy link
Member Author

For what it's worth, I have a workaround as I group all my benchmarks into a bench module, separate from my test module. I can run cargo test 'test::', which is inelegant but gets the job done for now.

I also don't know that this is a high-priority issue since benchmarking isn't a stable feature. However, I have personal interest in helping fix it. :-)

One idea I have would be to redirect the concept of bytes. Maybe Bencher could have a "suggested" byte size:

#[bench]
fn slow(b: &mut test::Bencher) {
    let s: String = iter::repeat("a").take(b.suggested_len).collect();
    b.iter(|| 1 + 1);
    b.bytes = s.len() as u64;
}

This could be set to a small value in test runs, and provide the side feature of allowing the benchmarking to automatically scale out different test sizes (1, 10, 100, 1000, etc).

Another would be to have a flag denoting if benching is actually happening:

#[bench]
fn slow(b: &mut test::Bencher) {
    let size = if b.is_benching { 5 * 1024 * 1024 } else { 1 };
    let s: String = iter::repeat("a").take(size).collect();
    b.iter(|| 1 + 1);
    b.bytes = s.len() as u64;
}

Both of these feel hacky though...

@steveklabnik steveklabnik added I-slow Issue: Problems and improvements with respect to performance of generated code. A-benchmarks labels May 13, 2015
@huonw
Copy link
Member

huonw commented May 16, 2015

FWIW, I just encountered an instance where a benchmark was triggering integer overflow in my library that was only picked up by cargo test, since cargo bench disables debug-assertions by default. Maybe this indicates my tests aren't good enough, but still, it seems valuable to attack correctness from as many sides as are available.

That said, I think I'd be happy to switch this to opt-in (although I would be opting-in essentially always). Alternatively, we could provide an opt-out flag.

@shepmaster
Copy link
Member Author

even the function marked #[bench](i.e. not just iter's closure) will be executed several times when running as a benchmark.

This was not something that I was consciously aware of, although it makes sense with what I have read about how the benchmarking tool works. I guess it's a good thing that the benching functionality is unstable, as there does seem to be some awkward corner cases 😸

a benchmark was triggering integer overflow in my library

It's hard to argue with results, I suppose

it seems valuable to attack correctness from as many sides as are available

I know you don't mean it this way, but there's a potential slippery slope here. I like using focused tools for specific jobs. There are tools like quickcheck or afl.rs that help stretch the input space of our code to find issues we didn't think about unit testing specifically. If it makes sense to use benchmarking tools for testing, does it also make sense to use testing tools for benchmarking?

I'd be happy to switch this to opt-in

I don't know if it's worth changing anything yet, especially based on a single person complaining. I probably wouldn't even be talking about this if I could generate a 5{KB,MB,GB} in a "quick" time.

@chuck-park
Copy link

Benchmarks still run even when run with --test.

@Mark-Simulacrum
Copy link
Member

@shepmaster Do you think it'd be worth adding a --no-bench flag? That would effectively allow disable benchmarks when you want to.

@shepmaster
Copy link
Member Author

adding a --no-bench flag?

It would certainly provide a more reusable knob to turn than my current "nest all benches in a module" approach, but I'd worry that there's still the surprise inherent with "Oh, #[bench] also means #[test], just it doesn't iterate a bunch".

If the attribute were something like #[test(bench)], that might make it more obvious, helping show the need for such a flag.

@Mark-Simulacrum
Copy link
Member

We could certainly deprecate #[bench] and change it to #[test(bench)] though I worry that the latter still isn't quite clear enough... it would improve the situation. However, I'm also somewhat concerned about just how much annoyance deprecating #[bench] would cause, since I'd guess a large portion of the community uses it.

@akshaynanavati
Copy link

To temporarily resolve this issue, I have done the following:

In my cargo.toml added a benchmarks feature:

...
[features]
benchmarks = []
...

Then, I add the following to my benchmarks module:

#[cfg(all(feature = "benchmarks", test))]
mod benchmarks {
    ...
}

When I just run cargo tests, the benchmarks aren't even compiled since the feature is not set. I can run cargo tests --features benchmarks or cargo bench --features benchmarks if I want the benchmarks compiled/run.

@Mark-Simulacrum Mark-Simulacrum added A-libtest Area: `#[test]` / the `test` library and removed A-benchmarks labels Jun 24, 2017
@Mark-Simulacrum Mark-Simulacrum added C-feature-request Category: A feature request, i.e: not implemented / a PR. C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 22, 2017
@buckle2000
Copy link

is there something like #[bench(no-test)] ?

@Enselic Enselic added the requires-nightly This issue requires a nightly compiler in some way. label Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-libtest Area: `#[test]` / the `test` library C-enhancement Category: An issue proposing an enhancement or a PR with one. C-feature-request Category: A feature request, i.e: not implemented / a PR. I-slow Issue: Problems and improvements with respect to performance of generated code. requires-nightly This issue requires a nightly compiler in some way.
Projects
Status: No status
Development

No branches or pull requests

9 participants