Time in nanoseconds is calculated wrong #40

qnikst · 2017-09-07T07:23:46Z

TLDR. time_in_nanoseconds is reported wrongly, we need to implement more advanced algorithm to measure that.

Full version.

I wanted to check the influence of the hyperion framework on the measured data. In order to do that I've used the same technique as @Shimuuar did for criterion. For criterion result told that the statistics measurements during run have very high impact on results and solution was to write temporary data to file and analyze that later, so I wanted to see if hyperion is affected.

Idea is to have a large number of the very simple measurements:

#!/usr/bin/env stack
-- stack --no-nix-pure runghc --package hyperion

{-# LANGUAGE OverloadedLists #-}

module Main where

import Hyperion.Benchmark
import Hyperion.Main
import Hyperion.Run

import Data.Monoid
import Text.Printf

benchmarks :: [Benchmark]
benchmarks =
  [ bench (printf "%03i" n) $ nf exp (0.73::Double) | n <- [1..999::Int]]

main :: IO ()
main = -- defaultMainWith defaultConfig{configMonoidSamplingStrategy=First $ Just $ fixed 1} "hyperion-example-micro-benchmarks" benchmarks
  defaultMain "hyperion-example-micro-benchmarks" benchmarks

I expect that when I'll measure that data all benchmarks will show more or less equal results. I used R in order to analyze results:

qnikst@qwork ~/workspace/tweag/hyperion $ ./dist-newstyle/build/hyperion-0.1.0.0/build/hyperion-sanity-check/hyperion-sanity-check --flat output.flat.json

library(jsonlite);
dat <- fromJSON("output.flatten.json");
> summary(dat$time_in_nanos)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  160.1   336.8   501.0   524.8   662.5 12990.0
> mean(dat$time_in_nanos);
[1] 524.8087
> sd(dat$time_in_nanos);
[1] 485.5656
> d <- dat[with(dat, order(bench_name)),]
> plot(d$time_in_nanos)

We see that standard deviation is almost equal to mean. Also we see that time_in_nanos grows with test index. Up to this point we may think that measurements affects experiment in undesirable way. As I told above criterion had that problem, I've tried to modify hyperion a bit and use Storable vectors instead of Unboxed to move them to non-haskell heap, and actually this helped a bit. Measurements no longer have that trend:

but what we can tell from the graph - we still have very large deviation in measurements and that ring us a bell.
In addition even if we use Storable vectors and run program with --pure we still have increasing time_in_nanos trend over an experiment number. So that solution is not full.

Happily hyperion does provide raw measurements data and I tried to analyze that myself. Dropping out all iterations and surprises that I had, I finished with the following code.

qnikst@qwork ~/workspace/tweag/hyperion $ ./dist-newstyle/build/hyperion-0.1.0.0/build/hyperion-sanity-check/hyperion-sanity-check --raw  --flat output.flat.json

Because like criterion hyperion runs each measurement in a number of batches, we could build a linear model, Time = value + batchSize*t - this way we can rule out an experiment costs from the results. See https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/ for more details.

dat <- fromJSON("output.json");  # read data
dat <- dat$results # ignore metadata
dat <- dat[,c("bench_name","measurements")] # select only interesting fields
dat <- dat[with(dat, order(bench_name)),] # order data

result <- apply(dat, 1, function(x) lm(formula=duration ~ batchSize, data=x$measurements)$coefficients["batchSize"]) # calculate linear model for each experiment
r <- result[order(names(result)) # reorder data so it's reported in the same order as it runs
plot(r)

Ignoring outlines those will be the matter of further analysis we see a flat line! After removing an outliners we see:

> mean(cleared)
[1] 115.1444
> sd(cleared)
[1] 24.45192

Those are perfect results! This means that actually hyperion does it's job right and correct analysis over the data leads to a correct results.

Actually what was the problem? After doing this I went to the sources and saw the following:

hyperion/src/Hyperion/Analysis.hs

Lines 49 to 66 in ccdccfa

    
             , _reportTimeInNanos = 
        
                 totalDuration / trueNumIterations 
        
             , _reportCycles = Nothing 
        
             , _reportAlloc = Nothing 
        
             , _reportGarbageCollections = Nothing 
        
             , _reportMeasurements = Just samp 
        
             } 
        
           where 
        
             totalDuration = 
        
               ala 
        
                 Sum 
        
                 (foldMapOf (measurements.each.duration.to realToFrac)) 
        
                 samp 
        
             trueNumIterations = 
        
               ala 
        
                 Sum 
        
                 (foldMapOf (measurements.each.batchSize.to realToFrac)) 
        
                 samp

instead of building a linear regression, we just summarize all measurement times and divide them by number of iterations! As a result measurement effect is not erased. I understand the purpose of that, such analysis works for all strategies, both fixed and incremental batches, however as analysis shows it's unreliable.

The text was updated successfully, but these errors were encountered:

Fixes #40, Added information to report display

nbacquey added a commit that referenced this issue Jan 9, 2019

Added linear regression to compute execution time

c3147a6

Fixes #40, Added information to report display

nbacquey mentioned this issue Jan 9, 2019

Added linear regression to compute execution time #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time in nanoseconds is calculated wrong #40

Time in nanoseconds is calculated wrong #40

qnikst commented Sep 7, 2017 •

edited

Loading

Time in nanoseconds is calculated wrong #40

Time in nanoseconds is calculated wrong #40

Comments

qnikst commented Sep 7, 2017 • edited Loading

qnikst commented Sep 7, 2017 •

edited

Loading