-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A benchmark's runtime can depend on the presence/absence of other benchmarks #60
Comments
I am observing a similar behavior. When I add a second benchmark the first one takes longer time. The two pieces of code I am benchmarking are completely unrelated. Also have tried this with many combinations and consistently seeing the same result. The measurement should measure only the function under test and should be independent of the surrounding code if that's the case. I do not think the generated code of the two benchmarked functions per se is different in the two cases. They are completely unrelated. Its hard to trust the benchmarking results because of this. Here is a simplified test code:
The first benchmark is measuring
When I add the second one it becomes:
A 30% degradation by just adding a line. The difference is even more marked in several other cases. This problem is forcing me to run criterion with only one benchmark at a time. Also note that the benchmark result is wrong even when only one benchmark is chosen out of many via command line. Just the presence of another benchmark is enough irrespective of the runtime selection. I am running criterion-1.1.1.0 and ghc-7.10.3. |
It seems my problem was due to the sharing of input data across benchmarks which caused undue memory pressure for the later benchmarks. The problem got resolved by using
One possible enhancement could be to strongly recommend using |
Yes, being cognizant of working set is hard with Haskell's lazy semantics and GHC's optimizations. I don't know what would be detected here for a warning though -- what would the check for a linter be? The original issue is a tricky one of compilation units. You can always put benchmarks in separate modules but that's a pain. I'm not sure what would be a good solution to alleviate this pain. TH doesn't seem sufficient. It kind of seems like you'd want something like fragnix to create minimal compliation units that isolate your benchmarks. |
FWIW, the |
See #166 for another example. |
The following code compares two functions for summing over a vector:
But suppose I change the last few lines to the following:
This, surprisingly, affects the runtime of the
sumV
benchmark. It makes it about 20% faster. Similarly, if we remove thesumV
benchmark and leave theVU.sum
benchmark, theVU.sum
benchmark becomes about 20% slower. Tests were run with the patched criterion-1.0.0.2 I sent on ghc-7.8.3 with the-O2 -fllvm
flags.What's going on is that different core is generated for the
sumV
andVU.sum
benchmarks depending on whether the other benchmark is present. Essentially, the common bits are being factored out and placed in a function, and this function is getting called in both benchmarks. This happens to make both benchmarks faster.I'm not sure if this should be considered a "proper bug," but it confused me for a an hour or so. It's something that criterion users (especially those performing really small benchmarks) probably should be aware of.
The text was updated successfully, but these errors were encountered: