-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Average and best of 10 values #20
Average and best of 10 values #20
Conversation
06345da
to
2143933
Compare
WIth this change we run the performance tests 10 times, and record the average and the best of 10 in the performance report. This will hopefully lead to more stable results. Contributed on behalf of STMicroelectronics.
1eb2a3d
to
6ac1e2d
Compare
6ac1e2d
to
0845315
Compare
scripts/performance-report.ts
Outdated
|
||
// TODO Post process values: | ||
// if label ends with _X | ||
// take current as label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort and remove last element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Matthew, thanks for your comment! However, I'm not entirely sure what you mean?
I've pushed a first feature-complete version of the workflow that performs ten runs of the measurement and then prints average and best of 10 per metric.
Please feel free to comment! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting a way to lazily handle rejecting the worst case outlier. I looked at your new code, It's pretty much perfect for that.
Considerations by the way:
- Worst case execution time can be due to being pre-empted.
- Best case can be due to cache. Since you rebuild a docker image each time, I think cache would be negligeable. So no need to remove the best case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification! Good idea!
I was thinking that maybe the median may even be a better choice as it would remove the effect of outliers altogether. I've changed the implementation accordingly in b26a5e7. Please let me know if you disagree. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Low outliers can be an issue. I would worry much less about them than the high outliers.
Basically functions duration tend to look like a chi2 function.
Here is an example of typical IRQ latency.
Rejecting the low end could be interesting as it would give more "consistent" data, but it reflects the behaviour of the software much more than the slow data that often is slow due to external interference.
TL;DR: Outside programs cannot interfere with your code and make it faster. If we want to reject interference, we reject the slow outliers. Fast outliers are often due to cache or failures. That is a separate but valid concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the input! Interesting!
Does that mean you would prefer an average that excludes best and worst case? Or should we keep the median, which should be rather similar than average rejecting best and worst?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the average excluding worst offenders at first. If we need to update the post-processed algorithm later, it can be done though as you retain the data.
I have a very LIGHT preference though and as long as the worst offenders are rejected, I'm fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got you! I've pushed 00d27f6 for computing the average of all values excluding the worst case. Thanks!
Compute average and best of 10 from multiple runs of a single measurement. Mark those entries with `[]` in the report. Contributed on behalf of STMicroelectronics.
17de55b
to
50ced9e
Compare
@tortmayr This is now ready for review, thanks! Before we merge, we should clean the history though. I'll take care of that before hitting the button. |
b26a5e7
to
00d27f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Change looks good to me 👍🏼.
WIth this change we run the performance tests 10 times, and record the average and the best of 10 in the performance report. This will hopefully lead to more stable results.
Contributed on behalf of STMicroelectronics.