Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard deviation to benchmarking #1604

Closed
wants to merge 8 commits into from
Closed

Add standard deviation to benchmarking #1604

wants to merge 8 commits into from

Conversation

jimon
Copy link

@jimon jimon commented Apr 18, 2019

Description

This PR adds standard deviation calculation to benchmark feature.
Deviation is useful when benchmarking OS dependent things: for example how long it takes to create a thread. Normal benchmark will only return a mean value like 100 us, but deviation will return a sigma, for example 20 us. Via 68–95–99.7 rule we can conclude that 95% of threads creations will be completed in [100 - 2 * 20, 100 + 2 * 20] = [60, 140] us.

Please let me know how can I improve it :)

@horenmar
Copy link
Member

Quick note before I start going through the commit -- please ensure that when using C stdlib, the includes and namespacing aligns. In other words, you should qualify the call as std::sqrt.

Not doing so breaks VxWorks and QNX.

@codecov
Copy link

codecov bot commented Apr 18, 2019

Codecov Report

Merging #1604 into master will decrease coverage by 0.42%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master   #1604      +/-   ##
=========================================
- Coverage   80.72%   80.3%   -0.42%     
=========================================
  Files         121     121              
  Lines        3424    3442      +18     
=========================================
  Hits         2764    2764              
- Misses        660     678      +18

@codecov
Copy link

codecov bot commented Apr 18, 2019

Codecov Report

Merging #1604 into master will decrease coverage by 0.24%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #1604      +/-   ##
==========================================
- Coverage   80.72%   80.49%   -0.24%     
==========================================
  Files         121      121              
  Lines        3424     3434      +10     
==========================================
  Hits         2764     2764              
- Misses        660      670      +10

@jimon
Copy link
Author

jimon commented Apr 18, 2019

@horenmar good catch, fixed it.

@horenmar
Copy link
Member

Okay, so I went through the changes and I think the PR needs to be done a bit differently.

I think the end state should be that there is still just one BenchmarkLooper type, and one macro to use it, BENCHMARK, with the number of samples to get for the stddev calculation being an optional argument to it.

@horenmar
Copy link
Member

I added a commit that shows the desired interface and usage 😃

Also the resulting output will need to be able to differentiate between the number of measurements and the number of iterations in a single measurement, so that will need some extra changes in the reporter and the BenchmarkInfo/BenchmarkStats struct.

@jimon
Copy link
Author

jimon commented Apr 20, 2019

@horenmar I see your point. as I've also started by just changing base looper :) There are two points related to that I would like to discuss:

  1. If precision of the timer is not low enough, and we go in 10x+ loops, then I think deviation might not make much sense as we get average smoothing of sample times. My math understanding says that deviation number that we get this way will be lower than actual number, which might be not good.
  2. Current benchmark looper has very tight loop, if I start adding more branches or measuring sample time every loop, I'm afraid I'll degrade it ability to do precise measurements.

What do you think about this two? How should we solve them?

@horenmar
Copy link
Member

The deviation should still be measured properly, if it is done right. Let's say there are 10 iterations per measurement, and the time taken per each measurement is T1, T2, ..., Tn. Ta is then the average of single iteration, that is a sum over Ti divided by N * 10. Variance is then the sum over all iterations as (Ti/10 - Ta)^2 / (N * 10).

@jimon
Copy link
Author

jimon commented Apr 23, 2019

Ah you already fixed everything :) (long weekend here in Sweden :) )

I still don't think we can call it sigma in case if timer resolution is not enough. Because the value that we get is smaller than a value we would get with infinitely precise timer. Let me explain it like this:

  1. Some test code, infinitely precise timer, 10 samples. We measure: 5ns 4ns 6ns ... And we calculate sigma based on this values. This is as precise as we can get.

  2. Same test code, less precise timer, 10 samples, 10 iterations per samples. We actually run code-under-test 100 times and real time that code would take would be: 5ns 4ns 6ns ... (100 times), but what we measure is 50ns 51ns 49ns ... (10 times). Our measurements are amount of time it took to run code-under-test 10 times, we don't have data to calculate variance of individual runs. So while sigma numbers we get are "close enough", it would be incorrect to call it stddev sigma as it will be <= of true sigma.

@horenmar
Copy link
Member

On further inspection I agree, but I also think I am going to close this PR in favour of #1616, which will bring in a fully featured benchmarking and is going forward faster than I originally expected.

@horenmar horenmar closed this May 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants