Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time in nanoseconds is calculated wrong #40

Open
qnikst opened this issue Sep 7, 2017 · 0 comments
Open

Time in nanoseconds is calculated wrong #40

qnikst opened this issue Sep 7, 2017 · 0 comments

Comments

@qnikst
Copy link
Contributor

qnikst commented Sep 7, 2017

TLDR. time_in_nanoseconds is reported wrongly, we need to implement more advanced algorithm to measure that.

Full version.

I wanted to check the influence of the hyperion framework on the measured data. In order to do that I've used the same technique as @Shimuuar did for criterion. For criterion result told that the statistics measurements during run have very high impact on results and solution was to write temporary data to file and analyze that later, so I wanted to see if hyperion is affected.

Idea is to have a large number of the very simple measurements:

#!/usr/bin/env stack
-- stack --no-nix-pure runghc --package hyperion

{-# LANGUAGE OverloadedLists #-}

module Main where

import Hyperion.Benchmark
import Hyperion.Main
import Hyperion.Run

import Data.Monoid
import Text.Printf

benchmarks :: [Benchmark]
benchmarks =
  [ bench (printf "%03i" n) $ nf exp (0.73::Double) | n <- [1..999::Int]]

main :: IO ()
main = -- defaultMainWith defaultConfig{configMonoidSamplingStrategy=First $ Just $ fixed 1} "hyperion-example-micro-benchmarks" benchmarks
  defaultMain "hyperion-example-micro-benchmarks" benchmarks

I expect that when I'll measure that data all benchmarks will show more or less equal results. I used R in order to analyze results:

qnikst@qwork ~/workspace/tweag/hyperion $ ./dist-newstyle/build/hyperion-0.1.0.0/build/hyperion-sanity-check/hyperion-sanity-check --flat output.flat.json
library(jsonlite);
dat <- fromJSON("output.flatten.json");
> summary(dat$time_in_nanos)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  160.1   336.8   501.0   524.8   662.5 12990.0
> mean(dat$time_in_nanos);
[1] 524.8087
> sd(dat$time_in_nanos);
[1] 485.5656
> d <- dat[with(dat, order(bench_name)),]
> plot(d$time_in_nanos)

1

We see that standard deviation is almost equal to mean. Also we see that time_in_nanos grows with test index. Up to this point we may think that measurements affects experiment in undesirable way. As I told above criterion had that problem, I've tried to modify hyperion a bit and use Storable vectors instead of Unboxed to move them to non-haskell heap, and actually this helped a bit. Measurements no longer have that trend:

1

but what we can tell from the graph - we still have very large deviation in measurements and that ring us a bell.
In addition even if we use Storable vectors and run program with --pure we still have increasing time_in_nanos trend over an experiment number. So that solution is not full.

Happily hyperion does provide raw measurements data and I tried to analyze that myself. Dropping out all iterations and surprises that I had, I finished with the following code.

qnikst@qwork ~/workspace/tweag/hyperion $ ./dist-newstyle/build/hyperion-0.1.0.0/build/hyperion-sanity-check/hyperion-sanity-check --raw  --flat output.flat.json

Because like criterion hyperion runs each measurement in a number of batches, we could build a linear model, Time = value + batchSize*t - this way we can rule out an experiment costs from the results. See https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/ for more details.

dat <- fromJSON("output.json");  # read data
dat <- dat$results # ignore metadata
dat <- dat[,c("bench_name","measurements")] # select only interesting fields
dat <- dat[with(dat, order(bench_name)),] # order data

result <- apply(dat, 1, function(x) lm(formula=duration ~ batchSize, data=x$measurements)$coefficients["batchSize"]) # calculate linear model for each experiment
r <- result[order(names(result)) # reorder data so it's reported in the same order as it runs
plot(r)

2
Ignoring outlines those will be the matter of further analysis we see a flat line! After removing an outliners we see:

> mean(cleared)
[1] 115.1444
> sd(cleared)
[1] 24.45192

Those are perfect results! This means that actually hyperion does it's job right and correct analysis over the data leads to a correct results.

Actually what was the problem? After doing this I went to the sources and saw the following:

, _reportTimeInNanos =
totalDuration / trueNumIterations
, _reportCycles = Nothing
, _reportAlloc = Nothing
, _reportGarbageCollections = Nothing
, _reportMeasurements = Just samp
}
where
totalDuration =
ala
Sum
(foldMapOf (measurements.each.duration.to realToFrac))
samp
trueNumIterations =
ala
Sum
(foldMapOf (measurements.each.batchSize.to realToFrac))
samp

instead of building a linear regression, we just summarize all measurement times and divide them by number of iterations! As a result measurement effect is not erased. I understand the purpose of that, such analysis works for all strategies, both fixed and incremental batches, however as analysis shows it's unreliable.

nbacquey added a commit that referenced this issue Jan 9, 2019
Fixes #40, Added information to report display
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant