FCP may impact by the loadEvent fire or not in simulator throttling. #9627

kkdev163 · 2019-08-30T11:23:21Z

Provide the steps to reproduce

Run LH on https://st.music.163.com/c/wapm-lighthouse-test/id

My test result can download in https://webapm.nos-jd.163yun.com/lighthouse%2F%2Ftmplh3.zip, it contains 6 times test results(JSON HTML trace file). The first test' FCP is 0.8s, others are larger than 1.6s.

What is the current behavior?

We build a page monitor system based on lighthouse-CLI. In this system, we test page in 20min interval, recently we find some page's metrics data is not stable. Some times the FCP' value may double than normal point.

I choose a very simple page like above to reproduce this behavior, I find some times the FCP is 0.8s some times the FCP is 1.6s. I'm very curious about the reason, so I also saved the trace file. I find when FCP is 1.6s, there are some network requests is not finished in the trace, and the load event is not fired normally.

Because we use the simulator throttling, so the FCP is impacted both by the FCP of trace and the page dependence Node tree. I guess the main reason cause the FCP varied largely is sometimes network request not completed, so network dependence Node computing varied.

I also changed the throttling method to 'devtools', I think this method uses the real FCP, not simulator computing result. I hope this will fine, but when I change to devtools, the page can not be measured in 10s, sometimes will also encounter NO_DCL error.

What is the expected behavior?

I'm curious about why some network request will not be completed in 10s, they are CDN static resource, it should complete very quickly.

And Dose the FCP simular algorithm should be adjusted to reduce the impact of load Event?

I think this problem is mainly caused by we are testing in docker images. We use this docker environment to test.

Can you give me some advice to let the metric not varied a largely.

Environment Information

Affected Channels: CLI
Lighthouse version: 5.2.0
Node.js version: 11
Operating System: Docker Linux Image
Chrome version: 69.0.3497.100

Related issues

kkdev163 · 2019-08-30T12:48:30Z

update: This may cause by Page's uncatched Error.

patrickhulce · 2019-08-30T18:18:17Z

Thank you very much for filing with such detailed trace data and investigation @kkdev163! Much appreciated!

You are very, very close in your estimation, and you've found a clever bug in simulation! The issue comes down our optimization that skips very small tasks and the fact that we always include the first ParseHTML event in FCP estimate. This ends up accidentally pulling in a JS file that shouldn't be necessary for the first paint (common.<hash>.js) when it happens to run in the same task as a later ParseHTML event.

lighthouse/lighthouse-core/computed/page-dependency-graph.js

Lines 104 to 112 in d735bfd

    
           // Skip all trace events that aren't schedulable tasks with sizable duration 
        
           if ( 
        
             !TracingProcessor.isScheduleableTask(evt) || 
        
             !evt.dur || 
        
             evt.dur < minimumEvtDur 
        
           ) { 
        
             i++; 
        
             continue; 
        
           }

lighthouse/lighthouse-core/computed/metrics/lantern-first-contentful-paint.js

Lines 100 to 107 in d735bfd

    
           // The first layout, first paint, and first ParseHTML are almost always necessary for first paint, 
        
           // so we always include those CPU nodes. 
        
           const firstLayout = cpuNodes.find(node => node.didPerformLayout()); 
        
           if (firstLayout) blockingCpuNodeIds.add(firstLayout.id); 
        
           const firstPaint = cpuNodes.find(node => node.childEvents.some(e => e.name === 'Paint')); 
        
           if (firstPaint) blockingCpuNodeIds.add(firstPaint.id); 
        
           const firstParse = cpuNodes.find(node => node.childEvents.some(e => e.name === 'ParseHTML')); 
        
           if (firstParse) blockingCpuNodeIds.add(firstParse.id);

Anyone on core feel confident enough in our lantern sesh yesterday to take a stab at fixing this one? ;) I think we just want to always keep the tasks with the first ParseHTML/Layout/Paint events regardless of how long they were. Alternatively we could just remove our optimization on task length and see how slow things get

FCP Graph When 800ms

FCP Graph When 1600ms

Source Trace When 800ms

Source Trace When 1600ms

kkdev163 · 2019-09-27T09:04:02Z

Hi, @patrickhulce I adopt your advice of 'just remove optimization on task length'. Generally speaking, After adopting this, some page's fcp、fmp、speedIndex was reduced, and the test normal rate is increasing, but stile has some bad case like below:

and it truly increases the test wait time. We use the test config of --onlyCategory=performance and --max-wait-for-load=10000, before adopting this, our pages can test complete within the 30s, after adopting this, some pages' test time may increase to 50s;

Since this advice helps a lot, so we adopt this in production, and we add a safe strategy to reduce the bad case rate that when we get test result, we compare this one with the history tests, if this increase more than 50%, we ignore this test result and test again, but if we received bad case more than 3 times in a period, we consider the page's real performance may have problems, so we save the bad case in to database. After adopting this strategy, our metric is more stabled

But today in the database, I found another bad case, which has a very high performance score, the test result series is like below:

And I also saved these test's html、json、trace、devtoolslog, you can download in this link, I hope this will help yours to fix this bug.

patrickhulce · 2019-09-27T15:09:39Z

Thanks very much for the thorough investigation and data @kkdev163!

Glad to hear you were able to address a few of the problems, we'll look into the files you provided 👍

patrickhulce · 2019-09-27T15:11:41Z

Also @brendankenny looks like some bad news for totally removing the short task optimization

before adopting this, our pages can test complete within the 30s, after adopting this, some pages' test time may increase to 50s;

brendankenny · 2019-09-27T15:16:09Z

ah, yeah, though I guess not unexpected

patrickhulce · 2019-09-27T15:21:53Z

@kkdev163 it looks like that 93 is a result of premature ending of the trace. The main JS bundle request took 7.1s to load on that run which is way too long for all the default network and CPU quiet thresholds to catch it. If you've coupled this with a max-wait-for-load of 10s, then we'll give up before the page has actually finished loading, so we'll evaluate the performance of what is basically a loading spinner app shell. There's no bug here for us to tackle as it's mostly up to the user to adjust maxWaitForLoad, networkQuietThresholdMs and cpuQuietThresholdMs based on their specific page, load environment, and network conditions.

FWIW there seems to be some crazy high variance in download times for that page's resources which is going to inevitably translate to some performance score variance. The main app bundle takes 31ms to download on some runs and 5+ seconds in other runs. I'm not sure where these tests are being run but they don't seem to have very reliable network capacity.

kkdev163 · 2019-11-01T06:36:33Z

@patrickhulce These tests are all run in independent Docker containers that host in the Linux machine. Every test will run in a new clean container, and we controller the max test parallel to the 8(CPU cores). Do you think the network is impacted by the parallel or the docker environment?

patrickhulce · 2019-11-01T13:07:34Z

Wait, does this mean it's running 8 parallel tests with only 8 cores? I would definitely not do that. At absolute most 1 run per 2 cores. But even then I'd expect some additional introduced variance. For our stability tests, we only ran 1 parallel run per ~4-8 cores, preferring physically separate machines when possible. We also don't run in docker containers, so we don't fully understand the characteristics there but we suspect there might be these sorts of differences.

kkdev163 · 2019-11-02T02:07:21Z

@patrickhulce yes, we run 8 parallel tests with only 8 cores. I will adjust my config according to your advice, thanks.

patrickhulce added variability bug labels Aug 30, 2019

brendankenny added needs-priority P1.5 and removed needs-priority labels Sep 24, 2019

brendankenny assigned connorjclark Sep 24, 2019

patrickhulce mentioned this issue Nov 2, 2019

core(lantern): Remove min task duration from lantern graphs #9910

Merged

brendankenny mentioned this issue Nov 5, 2019

[meta] Lighthouse 6.0 Burndown #9774

Closed

54 tasks

connorjclark mentioned this issue Nov 5, 2019

lantern: keep first layout/paint/parse events regardless of duration #9922

Merged

connorjclark closed this as completed in #9922 Jan 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FCP may impact by the loadEvent fire or not in simulator throttling. #9627

FCP may impact by the loadEvent fire or not in simulator throttling. #9627

kkdev163 commented Aug 30, 2019 •

edited

Loading

kkdev163 commented Aug 30, 2019 •

edited

Loading

patrickhulce commented Aug 30, 2019

kkdev163 commented Sep 27, 2019 •

edited

Loading

patrickhulce commented Sep 27, 2019

patrickhulce commented Sep 27, 2019

brendankenny commented Sep 27, 2019

patrickhulce commented Sep 27, 2019

kkdev163 commented Nov 1, 2019

patrickhulce commented Nov 1, 2019

kkdev163 commented Nov 2, 2019

FCP may impact by the loadEvent fire or not in simulator throttling. #9627

FCP may impact by the loadEvent fire or not in simulator throttling. #9627

Comments

kkdev163 commented Aug 30, 2019 • edited Loading

Provide the steps to reproduce

What is the current behavior?

What is the expected behavior?

Environment Information

kkdev163 commented Aug 30, 2019 • edited Loading

patrickhulce commented Aug 30, 2019

kkdev163 commented Sep 27, 2019 • edited Loading

patrickhulce commented Sep 27, 2019

patrickhulce commented Sep 27, 2019

brendankenny commented Sep 27, 2019

patrickhulce commented Sep 27, 2019

kkdev163 commented Nov 1, 2019

patrickhulce commented Nov 1, 2019

kkdev163 commented Nov 2, 2019

kkdev163 commented Aug 30, 2019 •

edited

Loading

kkdev163 commented Aug 30, 2019 •

edited

Loading

kkdev163 commented Sep 27, 2019 •

edited

Loading