Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(lantern): always use flexible network ordering #14612

Merged
merged 15 commits into from
Apr 19, 2024
Merged

Conversation

brendankenny
Copy link
Member

@brendankenny brendankenny commented Dec 14, 2022

Remove flexibleOrdering from the simulator and always start requests when able to be started and a connection is available. Part of #14166

Simplifies simulator code and improves lantern accuracy (with a small change to the TTI and SI coefficients)

image

(initial image from original PR - see above instead)


The lantern simulator has two modes for connection reuse controlled by the boolean flexibleOrdering:

  • if flexibleOrdering: false, the default value, any network request that was made on a warm connection (connectionReused: true) can only be started once that connection is warm, and the NetworkNode is kept in the queue until whatever request was first on that connection in the unthrottled real load is completed in the simulator
  • if flexibleOrdering: true, connectionReused is not considered in deciding when network requests should start

Originally the observed connection reuse ordering was required, and flexibleOrdering came later when we explicitly wanted to make requests earlier than they had occurred, as if they had been preloaded.

However, the default flexibleOrdering: false is a little bit weird today. It's basically an additional dependency constraint on the graph, though not represented through the main dependency mechanism. In an ideal world, if there needs to be a dependency relationship between two requests for improved accuracy, we should make it explicit. We're also simulating the whole page load on a different speed machine; there's no reason to idle a connection waiting for the request that was observed to be first on a connection when other requests could be made sooner (that works against the whole reason for simulating). That's in an ideal world again, though, and we may be missing key simulator features that are being made up for by flexibleOrdering.

When testing to see what additional work it would take to remove inflexible ordering, I was prepared for accuracy to drop and additional dependency-graph relationships to be needed to counteract that. However, the lantern tests actually showed small improvements in accuracy in multiple audits—highest in p90 SI—except a small decrease in accuracy in p50 TTI and p95 SI.

The lantern coefficients for SI and TTI are super old, so I re-computed them from the full 9-run lantern dataset, then scaled back the changes considerably because I'm worried about overfitting to the test data. Looking at the goals laid out in #5120, TTI stays at optimistic + pessimistic === 1 while favoring the pessimistic estimate slightly more now, and SI moves very slightly toward a total of 1, but more importantly can finally use an intercept of 0 (the last of the metrics to do so).

Combined, these two changes make major improvements to the SI accuracy. Most of that is due to the coefficient/intercept change, but it means we can simplify the simulator code and remove an artificial constraint while maintaining or improving simulator accuracy.

@connorjclark
Copy link
Collaborator

connorjclark commented Dec 14, 2022

Nice work!

However, the lantern tests actually showed small improvements in accuracy in multiple audits—highest in p90 SI—except a small decrease in accuracy in p50 TTI and p95 SI.

I take it that this statement is limited to metrics, without saying anything about other audits, correct?

I'm curious how you assess the changes in audits that went from inflexible to flexible. Since inflexible was the default value, from just looking at the PR diff we can only see the no-op changes from uses-http2, preload-lcp-image–nothing should have changed there. For the non-metric audits that changed simulation behavior (are there any?), can we assess how this change impacted the results? Or do we not particularly care, and should just limit our scope to the metrics when assessing changes to the simulator?

@brendankenny
Copy link
Member Author

Yeah, it's limited to the metrics. In theory, if you take LCP of the original graph and a modified graph and the opportunity is the difference, then any improvements in LCP accuracy should improve that estimate. But that's not necessarily true in practice (the dbw failure in this run looks like a good example, I need to compare the before/after to see if we were relying on coincidence before or if there's a real regression).

We don't have any automated method for evaluating opportunity accuracy, but we should consider how we could build one in 2023.

We should also update the lantern test data. Somehow it's already been three years since you updated it last.

@connorjclark
Copy link
Collaborator

connorjclark commented Dec 14, 2022

We should also update the lantern test data. Somehow it's already been three years since you updated it last.

Ooooh boy that will be a fun exercise! 3 years of software and product churn, and the cloud, what could go wrong?

@connorjclark
Copy link
Collaborator

We can add this for 10.0, just need to update the branch. Sorry it didn't get merged earlier.

@connorjclark
Copy link
Collaborator

@connorjclark connorjclark added the 11.0 cranked up to eleven label Jul 10, 2023
@connorjclark connorjclark removed the 11.0 cranked up to eleven label Jul 26, 2023
@paulirish paulirish added this to the v12.0 milestone Oct 5, 2023
@paulirish
Copy link
Member

huh we still want this right?

@connorjclark
Copy link
Collaborator

Yeah, it needs some TLC though (conflicts).

I wanted to have the lantern traces updated before merging, but an analysis of this PR's effects could be done posthoc so no need to wait on that.

@@ -82,7 +82,7 @@ describe('Render blocking resources audit', () => {
const settings = {throttlingMethod: 'simulate', throttling: mobileSlow4G};
const computedCache = new Map();
const result = await RenderBlockingResourcesAudit.audit(artifacts, {settings, computedCache});
expect(result.numericValue).toEqual(316);
expect(result.numericValue).toEqual(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: verify this or change test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const {nodeTimings} = simulator.simulate(fcpGraph); in render-blocking-resources.js estimateSavingsWithGraphs gives 2884 for script.js, whereas before it was 2100. AFAICT no other numbers in this method varied. This is enough to make the "before inline" FCP estimate always be higher than the "after inline" estimate for this trace.

The test does otherwise fail if we comment out the amp-specific handling in render-blocking-resources.js, and I think the change in the FCP estimate is to be expected from this PR, so updating to 0 here seems the approach to take.

@connorjclark
Copy link
Collaborator

Last remaining item is the smoke failure that only occurs in the bundled test (odd): yarn test-bundle byte-efficiency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants