Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea? Timing only particular sections of code #51

Closed
Andersama opened this issue Mar 22, 2021 · 7 comments
Closed

Idea? Timing only particular sections of code #51

Andersama opened this issue Mar 22, 2021 · 7 comments
Assignees
Labels
duplicate This issue or pull request already exists research-needed

Comments

@Andersama
Copy link

Recently I've been testing a few different sorting algorithms, the rough setup has a preallocated block of memory filled with random data. However that means for each benchmark run after a sort, I have to scramble or generate more random data, and this code is shared between all the benchmarks. Which roughly translates to a benchmark which is really measuring the cost of those two things together, rather than just the sorting algorithm on its own. Presumably the sort algorithm overwhelms the cost of generating new data, but it's difficult to gauge exactly how much of a cost generating data is without running a benchmark with that part on it's own.

It seems almost as if with a few edits to add additional callbacks it'd be possible to add timings or even ignore parts of code which are not really part of the test. If a second callback doesn't make the code more unstable / slow to test it'd probably be a handy tool for cases like this.

Might look like:

    template <typename Start, typename Op, typename End>
    ANKERL_NANOBENCH(NOINLINE)
    Bench& run(std::string const& benchmarkName, Start&& start, Op&& op, End&& end);
@martinus
Copy link
Owner

The problem of that feature is that when the runtime of op is not significantly (~2000 times or so) higher than the measurement resolution, the result will be highly inaccurate.

A simple solution is to do two benchmarks. One benchmark that measures Start + Op + End, and another benchmark that measures Start + End. The difference is the runtime for Op.

Having a feature that does these two benchmarks automatically and calculate the statistics from that would be nice to have though. But I really don't want a features that enables/disables a timer each run.

@Andersama
Copy link
Author

What's your gauge for highly in-accurate? I'm fairly confident your library's pretty good at what it's doing. Might have to do with whether a high performance clock is available, I'm pretty sure that your library is making use of one on my machine, but in benchmark runs I'm looking at some pretty fast functions which are roughly a cycle or two at most, and your library has seemingly been able to accurately test those roughly 1-3% self reported error. There's obviously a lot of variance in what the machine is doing at the time, but I'd rather have a real-world benchmark like that on my machine. Should I be re-evaluating those tests?

@martinus
Copy link
Owner

Nanobench is so accurate because it measures a lot of loops of the operation, not just a single one. It determines how many loops it need to run so the clock is reliable, then runs Op say 10000 times, and then divides the measurement result by 10000.

E.g. on my computer the std::chrono::steady_clock has a resolution of about 30ns, which is already pretty good. That means whatever you want to measure has to run for at least 30 microseconds to get relatively reliable measurements. nanobench tries to figure out the loop counter to achieve the 30microseconds, and then does the division. So in short, it's not actually measuring each call of Op.

@Andersama
Copy link
Author

Ah, ok, maybe lost in the weeds here, I sort've figured you're running multiple loops, the settings you have in the api give that away. I might play around then and see if I can implement this.

@Andersama
Copy link
Author

Not exactly familiar enough with how you're going about things in the api, this is just a skeleton:

template <typename Start, typename Op, typename End>
ANKERL_NANOBENCH_NO_SANITIZE("integer")
Bench& Bench::run(Start&& start, Op&& op, End&& end) {
    // It is important that this method is kept short so the compiler can do better optimizations/ inlining of op()
    detail::IterationLogic iterationLogic(*this);
    auto& pc = detail::performanceCounters();

    detail::IterationLogic iterationLogic2(*this);
    auto &pc2 = detail::performanceCounters();

    while (auto n = iterationLogic.numIters()) {
        pc.beginMeasure();
        Clock::time_point before = Clock::now();
        while (n-- > 0) {
            start();
            op();
            end();
        }
        Clock::time_point after = Clock::now();
        pc.endMeasure();
        pc.updateResults(iterationLogic.numIters());
        iterationLogic.add(after - before, pc);
    }
    // Ideally start() and end() are fast
    while (auto n = iterationLogic2.numIters()) {
        pc2.beginMeasure();
        Clock::time_point before = Clock::now();
        while (n-- > 0) {
            start();
            end();
        }
        Clock::time_point after = Clock::now();
        pc2.endMeasure();
        pc2.updateResults(iterationLogic2.numIters());
        iterationLogic2.add(after - before, pc2);
    }
//Subtract the results from the second loop from the first?

    return *this;
}

@Andersama
Copy link
Author

Andersama commented Mar 22, 2021

Got a bit confused, I guess your setup is updateResults() or iterationLogic.add() is responsible for adding results? This is just an automated run of the callbacks split from each other back to back. Not exactly pretty in the console output, but works.

template <typename Start, typename Op, typename End>
ANKERL_NANOBENCH_NO_SANITIZE("integer")
Bench& Bench::run(Start&& start, Op&& op, End&& end) {
    // It is important that this method is kept short so the compiler can do better optimizations/ inlining of op()
    detail::IterationLogic iterationLogic(*this);
    auto& pc = detail::performanceCounters();

    detail::IterationLogic iterationLogic2(*this);
    auto &pc2 = detail::performanceCounters();

    while (auto n = iterationLogic.numIters()) {
        pc.beginMeasure();
        Clock::time_point before = Clock::now();
        while (n-- > 0) {
            start();
            op();
            end();
        }
        Clock::time_point after = Clock::now();
        pc.endMeasure();
        pc.updateResults(iterationLogic.numIters());
        iterationLogic.add(after - before, pc);
    }
    // Could probably do w/ less allocations
    std::string title_tmp = name();
    std::string tmp_title = title_tmp + " (setup cost)";
    name(tmp_title);
    // Ideally start() and end() are fast
    while (auto n = iterationLogic2.numIters()) {
        pc2.beginMeasure();
        Clock::time_point before = Clock::now();
        while (n-- > 0) {
            start();
            end();
        }
        Clock::time_point after = Clock::now();
        pc2.endMeasure();
        pc2.updateResults(iterationLogic2.numIters());
        iterationLogic2.add(after - before, pc2);
    }
    iterationLogic.moveResultTo(mResults);
    iterationLogic2.moveResultTo(mResults);
    name(title_tmp);

    return *this;
}

@martinus
Copy link
Owner

See #86

@martinus martinus self-assigned this Feb 16, 2023
@martinus martinus added the duplicate This issue or pull request already exists label Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists research-needed
Projects
None yet
Development

No branches or pull requests

2 participants