You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TLDR. time_in_nanoseconds is reported wrongly, we need to implement more advanced algorithm to measure that.
Full version.
I wanted to check the influence of the hyperion framework on the measured data. In order to do that I've used the same technique as @Shimuuar did for criterion. For criterion result told that the statistics measurements during run have very high impact on results and solution was to write temporary data to file and analyze that later, so I wanted to see if hyperion is affected.
Idea is to have a large number of the very simple measurements:
We see that standard deviation is almost equal to mean. Also we see that time_in_nanos grows with test index. Up to this point we may think that measurements affects experiment in undesirable way. As I told above criterion had that problem, I've tried to modify hyperion a bit and use Storable vectors instead of Unboxed to move them to non-haskell heap, and actually this helped a bit. Measurements no longer have that trend:
but what we can tell from the graph - we still have very large deviation in measurements and that ring us a bell.
In addition even if we use Storable vectors and run program with --pure we still have increasing time_in_nanos trend over an experiment number. So that solution is not full.
Happily hyperion does provide raw measurements data and I tried to analyze that myself. Dropping out all iterations and surprises that I had, I finished with the following code.
Because like criterion hyperion runs each measurement in a number of batches, we could build a linear model, Time = value + batchSize*t - this way we can rule out an experiment costs from the results. See https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/ for more details.
dat<- fromJSON("output.json"); # read datadat<-dat$results# ignore metadatadat<-dat[,c("bench_name","measurements")] # select only interesting fieldsdat<-dat[with(dat, order(bench_name)),] # order dataresult<- apply(dat, 1, function(x) lm(formula=duration~batchSize, data=x$measurements)$coefficients["batchSize"]) # calculate linear model for each experimentr<-result[order(names(result)) # reorder data so it's reported in the same order as it runs
plot(r)
Ignoring outlines those will be the matter of further analysis we see a flat line! After removing an outliners we see:
instead of building a linear regression, we just summarize all measurement times and divide them by number of iterations! As a result measurement effect is not erased. I understand the purpose of that, such analysis works for all strategies, both fixed and incremental batches, however as analysis shows it's unreliable.
The text was updated successfully, but these errors were encountered:
TLDR.
time_in_nanoseconds
is reported wrongly, we need to implement more advanced algorithm to measure that.Full version.
I wanted to check the influence of the hyperion framework on the measured data. In order to do that I've used the same technique as @Shimuuar did for criterion. For criterion result told that the statistics measurements during run have very high impact on results and solution was to write temporary data to file and analyze that later, so I wanted to see if hyperion is affected.
Idea is to have a large number of the very simple measurements:
I expect that when I'll measure that data all benchmarks will show more or less equal results. I used
R
in order to analyze results:qnikst@qwork ~/workspace/tweag/hyperion $ ./dist-newstyle/build/hyperion-0.1.0.0/build/hyperion-sanity-check/hyperion-sanity-check --flat output.flat.json
We see that standard deviation is almost equal to mean. Also we see that
time_in_nanos
grows with test index. Up to this point we may think that measurements affects experiment in undesirable way. As I told above criterion had that problem, I've tried to modify hyperion a bit and useStorable
vectors instead ofUnboxed
to move them to non-haskell heap, and actually this helped a bit. Measurements no longer have that trend:but what we can tell from the graph - we still have very large deviation in measurements and that ring us a bell.
In addition even if we use
Storable
vectors and run program with--pure
we still have increasingtime_in_nanos
trend over an experiment number. So that solution is not full.Happily hyperion does provide raw measurements data and I tried to analyze that myself. Dropping out all iterations and surprises that I had, I finished with the following code.
qnikst@qwork ~/workspace/tweag/hyperion $ ./dist-newstyle/build/hyperion-0.1.0.0/build/hyperion-sanity-check/hyperion-sanity-check --raw --flat output.flat.json
Because like criterion hyperion runs each measurement in a number of batches, we could build a linear model,
Time = value + batchSize*t
- this way we can rule out an experiment costs from the results. See https://blog.janestreet.com/core_bench-micro-benchmarking-for-ocaml/ for more details.Ignoring outlines those will be the matter of further analysis we see a flat line! After removing an outliners we see:
Those are perfect results! This means that actually hyperion does it's job right and correct analysis over the data leads to a correct results.
Actually what was the problem? After doing this I went to the sources and saw the following:
hyperion/src/Hyperion/Analysis.hs
Lines 49 to 66 in ccdccfa
instead of building a linear regression, we just summarize all measurement times and divide them by number of iterations! As a result measurement effect is not erased. I understand the purpose of that, such analysis works for all strategies, both fixed and incremental batches, however as analysis shows it's unreliable.
The text was updated successfully, but these errors were encountered: