Average and best of 10 values #20

planger · 2023-10-01T18:53:10Z

WIth this change we run the performance tests 10 times, and record the average and the best of 10 in the performance report. This will hopefully lead to more stable results.

Contributed on behalf of STMicroelectronics.

WIth this change we run the performance tests 10 times, and record the average and the best of 10 in the performance report. This will hopefully lead to more stable results. Contributed on behalf of STMicroelectronics.

MatthewKhouzam · 2023-10-03T14:32:49Z

scripts/performance-report.ts

+
+    // TODO Post process values:
+    // if label ends with _X
+    //  take current as label


Sort and remove last element?

Hi Matthew, thanks for your comment! However, I'm not entirely sure what you mean?
I've pushed a first feature-complete version of the workflow that performs ten runs of the measurement and then prints average and best of 10 per metric.
Please feel free to comment! Thank you!

I was suggesting a way to lazily handle rejecting the worst case outlier. I looked at your new code, It's pretty much perfect for that.

Considerations by the way:

Worst case execution time can be due to being pre-empted.

Best case can be due to cache. Since you rebuild a docker image each time, I think cache would be negligeable. So no need to remove the best case.

Thanks for the clarification! Good idea!
I was thinking that maybe the median may even be a better choice as it would remove the effect of outliers altogether. I've changed the implementation accordingly in b26a5e7. Please let me know if you disagree. Thanks!

Low outliers can be an issue. I would worry much less about them than the high outliers.
Basically functions duration tend to look like a chi2 function.

Here is an example of typical IRQ latency.

Rejecting the low end could be interesting as it would give more "consistent" data, but it reflects the behaviour of the software much more than the slow data that often is slow due to external interference.

TL;DR: Outside programs cannot interfere with your code and make it faster. If we want to reject interference, we reject the slow outliers. Fast outliers are often due to cache or failures. That is a separate but valid concern.

Thank you for the input! Interesting!
Does that mean you would prefer an average that excludes best and worst case? Or should we keep the median, which should be rather similar than average rejecting best and worst?

I would prefer the average excluding worst offenders at first. If we need to update the post-processed algorithm later, it can be done though as you retain the data.

I have a very LIGHT preference though and as long as the worst offenders are rejected, I'm fine.

Ok got you! I've pushed 00d27f6 for computing the average of all values excluding the worst case. Thanks!

Compute average and best of 10 from multiple runs of a single measurement. Mark those entries with `[]` in the report. Contributed on behalf of STMicroelectronics.

planger · 2023-10-04T14:04:14Z

@tortmayr This is now ready for review, thanks! Before we merge, we should clean the history though. I'll take care of that before hitting the button.

tortmayr

Thanks! Change looks good to me 👍🏼.

planger force-pushed the average-and-best-run-report branch 2 times, most recently from 06345da to 2143933 Compare October 1, 2023 19:08

WIP: Average and best of 10 values

fef94c6

WIth this change we run the performance tests 10 times, and record the average and the best of 10 in the performance report. This will hopefully lead to more stable results. Contributed on behalf of STMicroelectronics.

planger force-pushed the average-and-best-run-report branch 10 times, most recently from 1eb2a3d to 6ac1e2d Compare October 2, 2023 19:08

Try adding measurements incrementally to gh-pages

0845315

planger force-pushed the average-and-best-run-report branch from 6ac1e2d to 0845315 Compare October 2, 2023 19:31

MatthewKhouzam reviewed Oct 3, 2023

View reviewed changes

planger changed the title ~~WIP: Average and best of 10 values~~ Average and best of 10 values Oct 3, 2023

planger requested a review from tortmayr October 3, 2023 19:35

Combine runs of a measurement into one entry

50ced9e

Compute average and best of 10 from multiple runs of a single measurement. Mark those entries with `[]` in the report. Contributed on behalf of STMicroelectronics.

planger force-pushed the average-and-best-run-report branch from 17de55b to 50ced9e Compare October 3, 2023 19:43

planger marked this pull request as ready for review October 4, 2023 14:03

Use again average that removes worst case

00d27f6

planger force-pushed the average-and-best-run-report branch from b26a5e7 to 00d27f6 Compare October 5, 2023 18:11

tortmayr approved these changes Oct 10, 2023

View reviewed changes

planger merged commit 5764ecb into eclipse-theia:main Oct 10, 2023
13 checks passed

planger deleted the average-and-best-run-report branch October 10, 2023 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Average and best of 10 values #20

Average and best of 10 values #20

planger commented Oct 1, 2023

MatthewKhouzam Oct 3, 2023

planger Oct 3, 2023

MatthewKhouzam Oct 4, 2023

planger Oct 4, 2023

MatthewKhouzam Oct 5, 2023

planger Oct 5, 2023

MatthewKhouzam Oct 5, 2023

planger Oct 5, 2023

planger commented Oct 4, 2023

tortmayr left a comment

Average and best of 10 values #20

Average and best of 10 values #20

Conversation

planger commented Oct 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

planger commented Oct 4, 2023

tortmayr left a comment

Choose a reason for hiding this comment