Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Average and best of 10 values #20

Merged
merged 4 commits into from
Oct 10, 2023

Conversation

planger
Copy link
Contributor

@planger planger commented Oct 1, 2023

WIth this change we run the performance tests 10 times, and record the average and the best of 10 in the performance report. This will hopefully lead to more stable results.

Contributed on behalf of STMicroelectronics.

@planger planger force-pushed the average-and-best-run-report branch 2 times, most recently from 06345da to 2143933 Compare October 1, 2023 19:08
WIth this change we run the performance tests 10 times, and record the
average and the best of 10 in the performance report. This will
hopefully lead to more stable results.

Contributed on behalf of STMicroelectronics.
@planger planger force-pushed the average-and-best-run-report branch 10 times, most recently from 1eb2a3d to 6ac1e2d Compare October 2, 2023 19:08
@planger planger force-pushed the average-and-best-run-report branch from 6ac1e2d to 0845315 Compare October 2, 2023 19:31

// TODO Post process values:
// if label ends with _X
// take current as label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort and remove last element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Matthew, thanks for your comment! However, I'm not entirely sure what you mean?
I've pushed a first feature-complete version of the workflow that performs ten runs of the measurement and then prints average and best of 10 per metric.
Please feel free to comment! Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting a way to lazily handle rejecting the worst case outlier. I looked at your new code, It's pretty much perfect for that.

Considerations by the way:

  • Worst case execution time can be due to being pre-empted.
  • Best case can be due to cache. Since you rebuild a docker image each time, I think cache would be negligeable. So no need to remove the best case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! Good idea!
I was thinking that maybe the median may even be a better choice as it would remove the effect of outliers altogether. I've changed the implementation accordingly in b26a5e7. Please let me know if you disagree. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low outliers can be an issue. I would worry much less about them than the high outliers.
Basically functions duration tend to look like a chi2 function.
image
Here is an example of typical IRQ latency.

irq-example

Rejecting the low end could be interesting as it would give more "consistent" data, but it reflects the behaviour of the software much more than the slow data that often is slow due to external interference.

TL;DR: Outside programs cannot interfere with your code and make it faster. If we want to reject interference, we reject the slow outliers. Fast outliers are often due to cache or failures. That is a separate but valid concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the input! Interesting!
Does that mean you would prefer an average that excludes best and worst case? Or should we keep the median, which should be rather similar than average rejecting best and worst?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer the average excluding worst offenders at first. If we need to update the post-processed algorithm later, it can be done though as you retain the data.

I have a very LIGHT preference though and as long as the worst offenders are rejected, I'm fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got you! I've pushed 00d27f6 for computing the average of all values excluding the worst case. Thanks!

@planger planger changed the title WIP: Average and best of 10 values Average and best of 10 values Oct 3, 2023
@planger planger requested a review from tortmayr October 3, 2023 19:35
Compute average and best of 10 from multiple runs of a single
measurement. Mark those entries with `[]` in the report.

Contributed on behalf of STMicroelectronics.
@planger planger force-pushed the average-and-best-run-report branch from 17de55b to 50ced9e Compare October 3, 2023 19:43
@planger planger marked this pull request as ready for review October 4, 2023 14:03
@planger
Copy link
Contributor Author

planger commented Oct 4, 2023

@tortmayr This is now ready for review, thanks! Before we merge, we should clean the history though. I'll take care of that before hitting the button.

@planger planger force-pushed the average-and-best-run-report branch from b26a5e7 to 00d27f6 Compare October 5, 2023 18:11
Copy link
Contributor

@tortmayr tortmayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Change looks good to me 👍🏼.

@planger planger merged commit 5764ecb into eclipse-theia:main Oct 10, 2023
13 checks passed
@planger planger deleted the average-and-best-run-report branch October 10, 2023 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants