Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FCP may impact by the loadEvent fire or not in simulator throttling. #9627

Closed
kkdev163 opened this issue Aug 30, 2019 · 10 comments · Fixed by #9922
Closed

FCP may impact by the loadEvent fire or not in simulator throttling. #9627

kkdev163 opened this issue Aug 30, 2019 · 10 comments · Fixed by #9922

Comments

@kkdev163
Copy link

kkdev163 commented Aug 30, 2019

Provide the steps to reproduce

  1. Run LH on https://st.music.163.com/c/wapm-lighthouse-test/id

My test result can download in https://webapm.nos-jd.163yun.com/lighthouse%2F%2Ftmplh3.zip, it contains 6 times test results(JSON HTML trace file). The first test' FCP is 0.8s, others are larger than 1.6s.

What is the current behavior?

We build a page monitor system based on lighthouse-CLI. In this system, we test page in 20min interval, recently we find some page's metrics data is not stable. Some times the FCP' value may double than normal point.

I choose a very simple page like above to reproduce this behavior, I find some times the FCP is 0.8s some times the FCP is 1.6s. I'm very curious about the reason, so I also saved the trace file. I find when FCP is 1.6s, there are some network requests is not finished in the trace, and the load event is not fired normally.

Because we use the simulator throttling, so the FCP is impacted both by the FCP of trace and the page dependence Node tree. I guess the main reason cause the FCP varied largely is sometimes network request not completed, so network dependence Node computing varied.

I also changed the throttling method to 'devtools', I think this method uses the real FCP, not simulator computing result. I hope this will fine, but when I change to devtools, the page can not be measured in 10s, sometimes will also encounter NO_DCL error.

What is the expected behavior?

I'm curious about why some network request will not be completed in 10s, they are CDN static resource, it should complete very quickly.

And Dose the FCP simular algorithm should be adjusted to reduce the impact of load Event?

I think this problem is mainly caused by we are testing in docker images. We use this docker environment to test.

Can you give me some advice to let the metric not varied a largely.

Environment Information

  • Affected Channels: CLI
  • Lighthouse version: 5.2.0
  • Node.js version: 11
  • Operating System: Docker Linux Image
  • Chrome version: 69.0.3497.100

Related issues

@kkdev163
Copy link
Author

kkdev163 commented Aug 30, 2019

update: This may cause by Page's uncatched Error.

@patrickhulce
Copy link
Collaborator

Thank you very much for filing with such detailed trace data and investigation @kkdev163! Much appreciated!

You are very, very close in your estimation, and you've found a clever bug in simulation! The issue comes down our optimization that skips very small tasks and the fact that we always include the first ParseHTML event in FCP estimate. This ends up accidentally pulling in a JS file that shouldn't be necessary for the first paint (common.<hash>.js) when it happens to run in the same task as a later ParseHTML event.

// Skip all trace events that aren't schedulable tasks with sizable duration
if (
!TracingProcessor.isScheduleableTask(evt) ||
!evt.dur ||
evt.dur < minimumEvtDur
) {
i++;
continue;
}

// The first layout, first paint, and first ParseHTML are almost always necessary for first paint,
// so we always include those CPU nodes.
const firstLayout = cpuNodes.find(node => node.didPerformLayout());
if (firstLayout) blockingCpuNodeIds.add(firstLayout.id);
const firstPaint = cpuNodes.find(node => node.childEvents.some(e => e.name === 'Paint'));
if (firstPaint) blockingCpuNodeIds.add(firstPaint.id);
const firstParse = cpuNodes.find(node => node.childEvents.some(e => e.name === 'ParseHTML'));
if (firstParse) blockingCpuNodeIds.add(firstParse.id);

Anyone on core feel confident enough in our lantern sesh yesterday to take a stab at fixing this one? ;) I think we just want to always keep the tasks with the first ParseHTML/Layout/Paint events regardless of how long they were. Alternatively we could just remove our optimization on task length and see how slow things get

FCP Graph When 800ms
image

FCP Graph When 1600ms
image

Source Trace When 800ms
image

Source Trace When 1600ms
image

@kkdev163
Copy link
Author

kkdev163 commented Sep 27, 2019

Hi, @patrickhulce I adopt your advice of 'just remove optimization on task length'. Generally speaking, After adopting this, some page's fcp、fmp、speedIndex was reduced, and the test normal rate is increasing, but stile has some bad case like below:
image

and it truly increases the test wait time. We use the test config of --onlyCategory=performance and --max-wait-for-load=10000, before adopting this, our pages can test complete within the 30s, after adopting this, some pages' test time may increase to 50s;

Since this advice helps a lot, so we adopt this in production, and we add a safe strategy to reduce the bad case rate that when we get test result, we compare this one with the history tests, if this increase more than 50%, we ignore this test result and test again, but if we received bad case more than 3 times in a period, we consider the page's real performance may have problems, so we save the bad case in to database. After adopting this strategy, our metric is more stabled
image

But today in the database, I found another bad case, which has a very high performance score, the test result series is like below:
image

And I also saved these test's html、json、trace、devtoolslog, you can download in this link, I hope this will help yours to fix this bug.

@patrickhulce
Copy link
Collaborator

Thanks very much for the thorough investigation and data @kkdev163!

Glad to hear you were able to address a few of the problems, we'll look into the files you provided 👍

@patrickhulce
Copy link
Collaborator

Also @brendankenny looks like some bad news for totally removing the short task optimization

before adopting this, our pages can test complete within the 30s, after adopting this, some pages' test time may increase to 50s;

@brendankenny
Copy link
Member

ah, yeah, though I guess not unexpected

@patrickhulce
Copy link
Collaborator

@kkdev163 it looks like that 93 is a result of premature ending of the trace. The main JS bundle request took 7.1s to load on that run which is way too long for all the default network and CPU quiet thresholds to catch it. If you've coupled this with a max-wait-for-load of 10s, then we'll give up before the page has actually finished loading, so we'll evaluate the performance of what is basically a loading spinner app shell. There's no bug here for us to tackle as it's mostly up to the user to adjust maxWaitForLoad, networkQuietThresholdMs and cpuQuietThresholdMs based on their specific page, load environment, and network conditions.

FWIW there seems to be some crazy high variance in download times for that page's resources which is going to inevitably translate to some performance score variance. The main app bundle takes 31ms to download on some runs and 5+ seconds in other runs. I'm not sure where these tests are being run but they don't seem to have very reliable network capacity.

@kkdev163
Copy link
Author

kkdev163 commented Nov 1, 2019

@patrickhulce These tests are all run in independent Docker containers that host in the Linux machine. Every test will run in a new clean container, and we controller the max test parallel to the 8(CPU cores). Do you think the network is impacted by the parallel or the docker environment?

@patrickhulce
Copy link
Collaborator

Wait, does this mean it's running 8 parallel tests with only 8 cores? I would definitely not do that. At absolute most 1 run per 2 cores. But even then I'd expect some additional introduced variance. For our stability tests, we only ran 1 parallel run per ~4-8 cores, preferring physically separate machines when possible. We also don't run in docker containers, so we don't fully understand the characteristics there but we suspect there might be these sorts of differences.

@kkdev163
Copy link
Author

kkdev163 commented Nov 2, 2019

@patrickhulce yes, we run 8 parallel tests with only 8 cores. I will adjust my config according to your advice, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants