Provide a mechanism to compare results from different runs #4

grobie · 2016-09-15T03:46:15Z

If prombench had the option to output the results in a benchcmp compatible output, we could easily generate reports on performance changes between versions.

ncabatoff · 2016-09-16T01:04:32Z

I'm not sure the go benchmark format is flexible enough for what we want to measure. I'm also still figuring out just what that is. Can we repurpose/rename this issue to "Provide a mechanism to compare results from different runs"?

grobie · 2016-09-16T01:22:38Z

@ncabatoff sounds good. I'd have some ideas, but I guess I should learn first what you had in mind. Maybe let's discuss per chat? Ping me on #prometheus in freenode (https://prometheus.io/community/)

grobie · 2016-09-16T18:42:39Z

I've been thinking about this a bit. I currently see the following requirements.

well defined format, so that we can build a compare tool
multiple test groups should be possible
each test might export multiple results, like like query latency while counting ingested samples

And then I thought, we already have the text file format, which would even allow us to us to important multiple results (with a carefully set timestamp) and use prometheus/grafana to compare test results

# Benchmark foobar
# Possibly some description of the benchmark.
# HELP prombench_foobar_ingested_samples_total some description
# TYPE prombench_foobar_ingested_samples_total counter
prombench_foobar_ingested_samples_total 8052134
# HELP prombench_foobar_query_latency_seconds some description
# TYPE prombench_foobar_query_latency_seconds histogram
prombench_foobar_query_latency_seconds{query="sum(rate(whatever[1m]))", le="..."} 0.004
prombench_foobar_query_latency_seconds{query="sum(rate(whatever[5m]))", le="..."} 0.013
...

ncabatoff · 2016-10-23T14:35:57Z

I like it. Why defer the Prometheus import of this data though, i.e. why go through a text file other than as an optional extra output? I'm thinking these metrics should be published continuously by prombench, and an external Prometheus that's not involved in the tests capture and store all these result metrics. I think I'll also move 'foobar' (benchmark name) out of the metric name and into a label, and add a label ("runname"?) to differentiate different executions.

…s Prometheus metrics. Write prombench's own metrics to testdir/metrics.txt on exit. Add a short sleep before exiting to make it more likely that an external Prometheus scraping us has time to get the final result. Partially addresses #4.

grobie · 2016-10-24T00:27:32Z

I guess I was looking at it from the query performance side again. I imagine scraping prombench directly is super helpful when testing the performance impact while working on some feature, but I believe a textfile output has some advantages as well:

control: When running the same test suite against different versions of Prometheus for example, I believe it would be helpful to have control when a benchmark is executed. With a Prometheus server we don't have control about when a scrape happens during a scrape interval.
restore: Readabile textfile formatted files can just be checked in into a repository, but we don't have a standard or human readable way to export+import data from a Prometheus server yet.
ease of use: Running yet another Prometheus server adds to the setup complexity to run a benchmark and compare the result with a previous run, e.g. in CI environments.

I'm open to move the benchmark name into a common metric as label. I tend to keep the number of labels low usually and was looking at it with textfile readability in mind, but it might be just the same in the end.

ncabatoff · 2016-10-24T02:04:08Z

No argument, but as I said in the commit message the change I made both publishes the metrics to Prometheus, and on exit it now scrapes itself and records the results in testdir/metrics.txt. Is that not adequate?

grobie · 2016-10-24T15:25:10Z

This is cool and probably enough for the beginning. I can see that we'll have benchmarks at some point which will run queries at certain intervals during a benchmark. For now working on just the standard text-file output should be fine.

grobie changed the title ~~Generate benchcmp compatible output~~ Provide a mechanism to compare results from different runs Sep 16, 2016

grobie mentioned this issue Sep 16, 2016

Test groups and thoughts on query tests #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a mechanism to compare results from different runs #4

Provide a mechanism to compare results from different runs #4

grobie commented Sep 15, 2016

ncabatoff commented Sep 16, 2016

grobie commented Sep 16, 2016

grobie commented Sep 16, 2016

ncabatoff commented Oct 23, 2016

grobie commented Oct 24, 2016

ncabatoff commented Oct 24, 2016

grobie commented Oct 24, 2016

Provide a mechanism to compare results from different runs #4

Provide a mechanism to compare results from different runs #4

Comments

grobie commented Sep 15, 2016

ncabatoff commented Sep 16, 2016

grobie commented Sep 16, 2016

grobie commented Sep 16, 2016

ncabatoff commented Oct 23, 2016

grobie commented Oct 24, 2016

ncabatoff commented Oct 24, 2016

grobie commented Oct 24, 2016