-
-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea? Timing only particular sections of code #51
Comments
The problem of that feature is that when the runtime of op is not significantly (~2000 times or so) higher than the measurement resolution, the result will be highly inaccurate. A simple solution is to do two benchmarks. One benchmark that measures Having a feature that does these two benchmarks automatically and calculate the statistics from that would be nice to have though. But I really don't want a features that enables/disables a timer each run. |
What's your gauge for highly in-accurate? I'm fairly confident your library's pretty good at what it's doing. Might have to do with whether a high performance clock is available, I'm pretty sure that your library is making use of one on my machine, but in benchmark runs I'm looking at some pretty fast functions which are roughly a cycle or two at most, and your library has seemingly been able to accurately test those roughly 1-3% self reported error. There's obviously a lot of variance in what the machine is doing at the time, but I'd rather have a real-world benchmark like that on my machine. Should I be re-evaluating those tests? |
Nanobench is so accurate because it measures a lot of loops of the operation, not just a single one. It determines how many loops it need to run so the clock is reliable, then runs E.g. on my computer the |
Ah, ok, maybe lost in the weeds here, I sort've figured you're running multiple loops, the settings you have in the api give that away. I might play around then and see if I can implement this. |
Not exactly familiar enough with how you're going about things in the api, this is just a skeleton: template <typename Start, typename Op, typename End>
ANKERL_NANOBENCH_NO_SANITIZE("integer")
Bench& Bench::run(Start&& start, Op&& op, End&& end) {
// It is important that this method is kept short so the compiler can do better optimizations/ inlining of op()
detail::IterationLogic iterationLogic(*this);
auto& pc = detail::performanceCounters();
detail::IterationLogic iterationLogic2(*this);
auto &pc2 = detail::performanceCounters();
while (auto n = iterationLogic.numIters()) {
pc.beginMeasure();
Clock::time_point before = Clock::now();
while (n-- > 0) {
start();
op();
end();
}
Clock::time_point after = Clock::now();
pc.endMeasure();
pc.updateResults(iterationLogic.numIters());
iterationLogic.add(after - before, pc);
}
// Ideally start() and end() are fast
while (auto n = iterationLogic2.numIters()) {
pc2.beginMeasure();
Clock::time_point before = Clock::now();
while (n-- > 0) {
start();
end();
}
Clock::time_point after = Clock::now();
pc2.endMeasure();
pc2.updateResults(iterationLogic2.numIters());
iterationLogic2.add(after - before, pc2);
}
//Subtract the results from the second loop from the first?
return *this;
} |
Got a bit confused, I guess your setup is updateResults() or iterationLogic.add() is responsible for adding results? This is just an automated run of the callbacks split from each other back to back. Not exactly pretty in the console output, but works. template <typename Start, typename Op, typename End>
ANKERL_NANOBENCH_NO_SANITIZE("integer")
Bench& Bench::run(Start&& start, Op&& op, End&& end) {
// It is important that this method is kept short so the compiler can do better optimizations/ inlining of op()
detail::IterationLogic iterationLogic(*this);
auto& pc = detail::performanceCounters();
detail::IterationLogic iterationLogic2(*this);
auto &pc2 = detail::performanceCounters();
while (auto n = iterationLogic.numIters()) {
pc.beginMeasure();
Clock::time_point before = Clock::now();
while (n-- > 0) {
start();
op();
end();
}
Clock::time_point after = Clock::now();
pc.endMeasure();
pc.updateResults(iterationLogic.numIters());
iterationLogic.add(after - before, pc);
}
// Could probably do w/ less allocations
std::string title_tmp = name();
std::string tmp_title = title_tmp + " (setup cost)";
name(tmp_title);
// Ideally start() and end() are fast
while (auto n = iterationLogic2.numIters()) {
pc2.beginMeasure();
Clock::time_point before = Clock::now();
while (n-- > 0) {
start();
end();
}
Clock::time_point after = Clock::now();
pc2.endMeasure();
pc2.updateResults(iterationLogic2.numIters());
iterationLogic2.add(after - before, pc2);
}
iterationLogic.moveResultTo(mResults);
iterationLogic2.moveResultTo(mResults);
name(title_tmp);
return *this;
} |
See #86 |
Recently I've been testing a few different sorting algorithms, the rough setup has a preallocated block of memory filled with random data. However that means for each benchmark run after a sort, I have to scramble or generate more random data, and this code is shared between all the benchmarks. Which roughly translates to a benchmark which is really measuring the cost of those two things together, rather than just the sorting algorithm on its own. Presumably the sort algorithm overwhelms the cost of generating new data, but it's difficult to gauge exactly how much of a cost generating data is without running a benchmark with that part on it's own.
It seems almost as if with a few edits to add additional callbacks it'd be possible to add timings or even ignore parts of code which are not really part of the test. If a second callback doesn't make the code more unstable / slow to test it'd probably be a handy tool for cases like this.
Might look like:
The text was updated successfully, but these errors were encountered: