Should we allow telemetry devices to output high-cardinality data to the results? #1431

pquentin · 2022-02-03T07:25:39Z

During a race, Rally stores information in multiple Elasticsearch indices:

rally-races-YYYY-MM: metadata about the race, including results, in a single doc
rally-results-YYYY-MM: individual results docs: typically one hundred per race. Results are written in the textual summary report at the end of each race, where we show up to 15 lines per task (error rate, min/mean/media/max throughput and p50/p90/p99/p99.9/p100 latency/service time). We also have tooling to compare results.
rally-metrics-YYYY-MM: individual metrics docs: typically multiple millions (!) for long-running races. Metrics are... less pleasant to work with. You need access to the metrics store and then need to figure out your own queries or visualization. There's also no tool to compare metrics between races.

Given this, when working on #1428, @nik9000 decided to store the collected data of his new telemetry devices in the results. That way, it shows up in the summary report, it's easy to compare and he does not have to worry about the metrics store. In his own words: "i'd love an option I think to get all the info to print. in my normal workflow I don't touch the metric store and really just want to print things."

So, what we should do about it?

Allow non-default telemetry devices to somehow put their metrics in the report?
Add a command to show data for a specific telemetry device in text form? After all, if Rally can write to it, it can read from it.
Special case the disk usage telemetry device and just dump its metrics? @jpountz mentioned this specific device was interesting to him to diagnose our nightly benchmarks.

I'm personally more in favor of option 2.

The text was updated successfully, but these errors were encountered:

nik9000 · 2022-02-03T12:11:56Z

It's important to me at least to be able to compare the results here. Yesterday I put together esrally compare support for the field-disk-usage prototype I'm working on and the output was quite educational:

|     tsdb @timestamp doc values |  | 409.3 MB | 386.3 MB |  -23.0 MB | |  -5.61% |
|         tsdb @timestamp points |  | 423.0 MB | 397.9 MB |  -25.0 MB | |  -5.91% |
|          tsdb @timestamp total |  | 832.2 MB | 784.2 MB |  -48.0 MB | |  -5.77% |
|        tsdb _seq_no doc values |  | 409.3 MB | 386.3 MB |  -23.0 MB | |  -5.61% |
|            tsdb _seq_no points |  | 590.1 MB | 557.5 MB |  -32.6 MB | |  -5.52% |
|             tsdb _seq_no total |  | 999.4 MB | 943.8 MB |  -55.6 MB | |  -5.56% |
| tsdb event.duration doc values |  | 555.4 MB | 548.4 MB |   -7.1 MB | |  -1.27% |
|     tsdb event.duration points |  | 637.4 MB | 629.7 MB |   -7.7 MB | |  -1.21% |
|      tsdb event.duration total |  |   1.2 GB |   1.2 GB |  -14.8 MB | |  -1.24% |
|        tsdb _id inverted index |  | 863.5 MB |   1.0 GB | +176.4 MB | | +20.42% |
|         tsdb _id stored fields |  | 632.4 MB | 610.9 MB |  -21.5 MB | |  -3.40% |
|                 tsdb _id total |  |   1.5 GB |   1.6 GB | +154.9 MB | | +10.35% |
|     tsdb _source stored fields |  |  25.9 GB |  24.1 GB |   -1.8 GB | |  -6.94% |
|             tsdb _source total |  |  25.9 GB |  24.1 GB |   -1.8 GB | |  -6.94% |

pquentin · 2022-02-23T13:32:59Z

Allow non-default telemetry devices to somehow put their metrics in the report?

After thinking about it and discussing it with @danielmitterdorfer, this is fine for useful non-default telemetry devices in general and #1428 in particular. As Nik showed, this is really useful when you care about it. For more exotic cases, then #1224 would be the way to go.

pquentin added the discuss Needs further clarification from the team label Feb 3, 2022

pquentin added this to the 2.x milestone Feb 3, 2022

pquentin closed this as completed Feb 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we allow telemetry devices to output high-cardinality data to the results? #1431

Should we allow telemetry devices to output high-cardinality data to the results? #1431

pquentin commented Feb 3, 2022

nik9000 commented Feb 3, 2022

pquentin commented Feb 23, 2022

Should we allow telemetry devices to output high-cardinality data to the results? #1431

Should we allow telemetry devices to output high-cardinality data to the results? #1431

Comments

pquentin commented Feb 3, 2022

nik9000 commented Feb 3, 2022

pquentin commented Feb 23, 2022