core(metrics): update lantern coefficients #5120
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates the lantern coefficients to the values that offer the lowest absolute error of the 9-run accuracy/variance dataset, they haven't ever been updated so the jump is expected. this PR also turns on the
flexibleOrdering
feature for the optimistic estimate which improved accuracy ~5% on FCP/FMP 🎉In the ideal case if lantern were perfect, we would expect the optimistic/pessimistic coefficients to both be positive, sum to ~1, and the intercept to be 0. The good news is we've made considerable progress toward this goal for FCP/FMP, intercepts are significantly lower and coefficients are positive and sum to ~1.
Progress is positive on TTI, the coefficients now sum to ~1 instead of ~1.5 and the intercept is roughly the same.
Speed Index is just weird, the optimistic estimate being real speed index throws things off quite a bit. The lowest absolute error is coefficients at ~2x and then a negative intercept. This actually doesn't seem that bad since on the low end we'll be using the FCP estimate for speed index anyhow, and if it improves accuracy on the high end of speed index, we'll take it, but still not an ideal situation.
MAPE State of the World
Aside: once I add the GCP test suite I think we should shift away from measuring based on MAPE and do the search approach, % of results that are Good/OK/Terrible which is much easier to interpret and judge success