Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Improve Android Cold Boot TTI (Time To Interactive) #4656

Closed
1 of 5 tasks
marcaaron opened this issue Aug 13, 2021 · 61 comments
Closed
1 of 5 tasks

[Performance] Improve Android Cold Boot TTI (Time To Interactive) #4656

marcaaron opened this issue Aug 13, 2021 · 61 comments
Assignees
Labels
Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 Engineering Internal Requires API changes or must be handled by Expensify staff

Comments

@marcaaron
Copy link
Contributor

marcaaron commented Aug 13, 2021

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


What performance issue do we need to solve?

  • App launch and time to interactive
  • Issue feels likely to be in the JS layer

What is the impact of this on end-users?

The app takes a long time to become interactive on Android.
We know that the number of reports a user has can impact the severity.

List any benchmarks that show the severity of the issue

We can test this in an Android release build by using the CAPTURE_METRICS=true option in .env file and looking at the timeToInteractive metric after booting up the app (must be previously signed in).

Follow Instructions here:
https://github.com/Expensify/App/blob/main/PERFORMANCE.md#performance-metrics-opt-in-on-local-release-builds

2021-08-13_06-55-18

Android app startup JS sampling profile to inspect in Chrome Dev Tools (Taken from a Galaxy S8):

sampling-profiler-trace4670664525343774149-converted.json.zip

There's also a lot of context in this issue.

Proposed solution (if any)

N/A

List any benchmarks after implementing the changes to show impacts of the proposed solution (if any)

N/A

Platform:

Where is this issue occurring?

  • Web
  • iOS
  • Android
  • Desktop App
  • Mobile Web

Version Number:
Logs: https://stackoverflow.com/c/expensify/questions/4856
Notes/Photos/Videos: Any additional supporting documentation
Expensify/Expensify Issue URL:

View all open jobs on Upwork


Additional Proposal Notes

  • Please refer to the linked issue for more context to avoid suggesting things we've already done
  • We will be accepting multiple proposals so there is no one right answer
  • Proposals will continue until the TTI is acceptable
  • Proposals should only be accepted if they can improve the TTI using the metric provided
  • Must use actual Android hardware and run a release build to verify that your solution works
@marcaaron marcaaron added Engineering Daily KSv2 External Added to denote the issue can be worked on by a contributor labels Aug 13, 2021
@MelvinBot
Copy link

Triggered auto assignment to @bfitzexpensify (External), see https://stackoverflow.com/c/expensify/questions/8582 for more details.

@MelvinBot MelvinBot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Aug 16, 2021
@MelvinBot
Copy link

Triggered auto assignment to @Luke9389 (Exported), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

@bfitzexpensify
Copy link
Contributor

Upwork posting

@MelvinBot MelvinBot removed the Overdue label Aug 16, 2021
@kidroca
Copy link
Contributor

kidroca commented Aug 16, 2021

I've managed to hook the flipper plugin to print this information instead of the Alert https://github.com/oblador/react-native-performance/blob/master/packages/flipper-plugin-performance/README.md

  • the newer plugin is flipper-plugin-performance (not flipper-plugin-react-native-performance (deprecated))

Screenshot 2021-08-16 at 22 33 15

You can mark more events and they would appear on the timeline as well e.g. report selected -> report rendered

This also allows exporting and importing files to Flipper
FlipperExport.flipper.zip

@marcaaron
Copy link
Contributor Author

@kidroca is there some way to use Flipper on a release build? I've found that these metrics are much different on dev builds which is why the alert method is used here.

@kidroca
Copy link
Contributor

kidroca commented Aug 16, 2021

Any updates made on the ReportScreen or it's children doesn't seem to affect this metric as the end mark for TTI happens before the ReportScreen mounts (or shortly after it starts mounting)

@kidroca is there some way to use Flipper on a release build? I've found that these metrics are much different on dev builds which is why the alert method is used here.

I'm not sure how, but it's probably possible, I remember I've done iOS prod builds where the dev menu is enabled
For Android you can switch between PROD and DEV js bundle from the menu and it makes a big difference
image

I've added marks for ReportActionsView.
Marks: LHN click -> RAV mount -> RAV layout, here's how they look using PROD js
image

  • the last 2 switches taking less time - these are switching back to one of the previous two reports

And here's the same for a DEV build
image

  • here I can't capture all interactions on a single screen, but they are similar to the one on the left 2.1sec, the last 2 are again switch backs

@marcaaron
Copy link
Contributor Author

For Android you can switch between PROD and DEV js bundle from the menu and it makes a big difference

It's good to have the bundle optimized for release, but it takes time to download via the metro server. This is important when we are timing a cold start from app icon tap to interactive so release builds have been most reliable for me.

That said, I don't mind if we enable this flipper plugin or add additional metrics if we think that will be helpful.

@marcaaron
Copy link
Contributor Author

marcaaron commented Aug 16, 2021

@bfitzexpensify @Luke9389 just a heads up that we can accept multiple proposals for this one. I don't think there will be a single solution to this one so mainly looking to accept any proposal that can improve this metric (which could mean the issue has multiple assignees).

@kidroca
Copy link
Contributor

kidroca commented Aug 17, 2021

Captured some more insights as to where we're spending time during init

This is a PROD js build (from the dev menu - not a release build)
image

I'm 99% sure the red 🔴 needles are showing initial script parse and execution. Though I couldn't find documentation confirming this.
Everything before the 1st 🔴 needle is probably something we can ignore as we can't really affect it by changing the JS.
That's also mentioned inside the plugin code:

// The bundle download trace can be very long but has no real impact on
// perf so we try to alter the marks to act as if it wasn't there

We're seeing so far before the 1st needle because of the measures we've made, otherwise the timeline would start from the first needle

Slightly before the 2nd 🔴 needle are the first Onyx calls, they are part of the initial script execution - Onyx.init is not called inside a componentDidMount but a as a top level call, so it's considered a part of the initial script execution (Maybe we can try to move it inside a component lifecycle call to see if this gains anything)

700ms past the 2nd 🔴 needle is the TTI event
If we measure from First onyx call (at 7.700) to TTI (at 9.100) it's about 1.4sec

It's really easy to update the Onyx metrics to be captured using react-native-performance, visualized along some other interactions it provides an interesting insight - should I open a PR about this?
I've also enabled the networking metrics

Here's the export from flipper, you'll need to have the Performance plugin installed to be able to see it
FlipperExport.zip

@kidroca
Copy link
Contributor

kidroca commented Aug 17, 2021

From Marc's screen in the description the runJsBundle time is a lot less 353ms, compared to mine 4200ms
This can vary from device to device, but maybe the huge diff is because there are still some debugging tools running

IMO we can use the debug build to test theories and then use the release build (before/after) to prove them

@marcaaron
Copy link
Contributor Author

I like the idea to capture Onyx metrics with the Flipper plugin. I haven't relied on them very much tbh so maybe having them enabled when CAPTURE_METRICS is true will make them more discoverable for anyone investigating performance issues.

From Marc's screen in the description the runJsBundle time is a lot less 353ms, compared to mine 4200ms

I think it's because the JS source is precompiled into bytecode for the release build when using Hermes engine? Which perhaps does not happen when we build the debug app. But unsure tbh.

IMO we can use the debug build to test theories and then use the release build (before/after) to prove them

I agree with this, but at the end of the day the release build never lies 😄

@bfitzexpensify
Copy link
Contributor

@bfitzexpensify @Luke9389 just a heads up that we can accept multiple proposals for this one. I don't think there will be a single solution to this one so mainly looking to accept any proposal that can improve this metric (which could mean the issue has multiple assignees).

Sure thing — just let me know who we want to hire and I'll keep the posting open (same to you @Luke9389)

@kidroca
Copy link
Contributor

kidroca commented Aug 18, 2021

I think it's because the JS source is precompiled into bytecode for the release build when using Hermes engine?

Correct.

To see the benefits of Hermes, try making a release build/deployment of your app
This will compile JavaScript to bytecode during build time which will improve your app's startup speed
https://reactnative.dev/docs/next/hermes#confirming-hermes-is-in-use

0.35 vs 4.2 is a big improvement.

I'll check if there's some way to use Flipper to see the performance results of a release build

An alternative is that a debug build can be configured to precompile JS the same way a release does. Saw that in one of the perf capturing tutorials, there's a gradle config about it. This would keep Flipper available, and skip having to create certificates to sign release builds

I wonder why on iOS a dev build still seems significantly faster in every aspect
We try to pump everything possible out of Android doing a release build and still can't catch up to iOS on a dev build

@kidroca
Copy link
Contributor

kidroca commented Aug 18, 2021

Application bundling can be updated to use RAM bundles to improve TTI in general
This means extracting non essential parts of the application to separate chunks that are lazy loaded on demand

Related Article: https://reactnative.dev/docs/ram-bundles-inline-requires

We load all the screens and navigators even though only 1 is used during init, and some might not be used at all in a single session
Using RAM bundles and inline requires we can optimize the app to load such components the first time we navigate to a given route

My idea is to 1) make a crude check on potential improvements by removing all navigators but the MainDrawerNavigator
and if it indeed is beneficial 2) Make a LazyNavigator component that would encapsulate the inline require logic and would be applied over existing navigators

During 1) We can also capture memory usage with/without navigators as besides TTI this should potentially improve memory by offloading unused screens

@marcaaron
Copy link
Contributor Author

I did some digging on RAM bundles at one point and the discovered they aren't compatible when using Hermes. Something something mmap...

Screen Shot 2021-08-18 at 6 53 47 AM

Inline requires should still work with Hermes (I know I read this somewhere but didn't think of any great ways to implement) so that's something we could try. I'd definitely be curious to know if there are some large libraries or things doing a lot of work that could be lazy loaded or deferred.

As for react-navigation, I read something somewhere (SO or Reddit hearsay I think) that Android is "not as efficient" as iOS at handling the huge tree of components that react-navigation necessarily creates. I followed some suggestions to use the native stack navigator. Which seemed to help marginally, but also limited the flexibility of screen animations. Cutting react-navigation out of the picture entirely (and just rendering the sidebar on init without anything else) did yield about a ~400 ms reduction in TTI.

@kidroca
Copy link
Contributor

kidroca commented Aug 18, 2021

I've submit a PR here to review the Onyx metrics inside Flipper: Expensify/react-native-onyx#101
I'll submit another one that adds the Flipper Performance plugin and captures the ReportActionsView layout in Expensify/App

@kidroca
Copy link
Contributor

kidroca commented Aug 19, 2021

It's not possible to use Flipper with a release build (or at least without tweaking too many things)

"Question: Can I use flipper on a release build on Android/iOS?" - facebook/flipper#1488

It's relatively easy to include a compiled bundle (bytecode) in the dev build (it's still not the same as running a release build, but comes close)
image
image

There's a large piece of time where we just see the splash screen. The reason is on a debug build the app tries to connect to the metro server - from my observations the app waits for a connection for about 5-6 sec then continues to load the bundled bytecode

  • On one of my previous shots you can see the red needle around the 3.2-4s mark (after a 2sec download time for js)
  • On the one here it's on the 6.4s mark (with no js download)

Something else we can see is where the runJsBundle takes place on the timeline
Other than that the rest of the JS timings seem to be the same

To run a debug build with precompiled JS bytecode

  1. Update app/build.gradle with bundleInDebug: true
project.ext.react = [
    enableHermes: true,  // clean and rebuild if changing
    bundleInDebug: true,
]
  1. Run the app with npm run android but stop the metro dev server

@kidroca
Copy link
Contributor

kidroca commented Aug 19, 2021

I'd definitely be curious to know if there are some large libraries or things doing a lot of work that could be lazy loaded or deferred.

This tool can help with tracking any large JS bundles, and it can be run with npx react-native-bundle-visualizer
https://www.npmjs.com/package/react-native-bundle-visualizer

I've run it on Expensify and here's what I've found out

  • total size 3.66MB (though it's not very accurate as it's actually 4.05MB)
  • external libraries (from node_modules) - 2.7MB (61% of the source)
  • src - 813kB - 18% of the source
  • react-native - 586kB 13%
  • react-native-reanimated - 321kB 7%
  • react-native-svg - 204kB 4.5%
  • @formatJs - 199kB 4.4%
  • @react-navigation - 185kB 4.1%

A tool like this usually helps with tracking packages that are added 2x. or not tree shaken properly

  • you can see expensify-common being added 2x
  • there's also underscore 3x (expensify-common, onyx, and the app seem to use different versions of it)
  • for some reason the whole lodash is loaded (100kB) even though we only use and import specific methods from it

To track heavy work we can use the Hermes profiling flow, or capture more metrics with react-native-performance

@Luke9389 Luke9389 removed their assignment Sep 14, 2022
@marcaaron marcaaron added Internal Requires API changes or must be handled by Expensify staff and removed External Added to denote the issue can be worked on by a contributor labels Sep 14, 2022
@marcaaron
Copy link
Contributor Author

Let's maybe see if anyone wants to lead this up internally. There's a PR out that improved the TTI on Android by like 800ms atm. So unsure how much faster we are looking to make things beyond that.

@marcaaron marcaaron added Weekly KSv2 and removed Exported Monthly KSv2 labels Sep 14, 2022
@Luke9389
Copy link
Contributor

I know @AndrewGable has been talking with callstack so perhaps he'd be interested in taking this.

@JmillsExpensify
Copy link

JmillsExpensify commented Sep 16, 2022

I also just highlighted this issue in Slack as a performance issue in need or a volunteer.

@trjExpensify
Copy link
Contributor

@marcaaron - which PR is that? #4656 (comment)

@marcaaron
Copy link
Contributor Author

This one here to lazy load navigation screens -> #10986

@melvin-bot

This comment was marked as resolved.

@marcaaron
Copy link
Contributor Author

Noticing that this issue is eroding a bit and not getting much love. Maybe it's not clear what we are trying to achieve?

Ideas for how to take action on this...

  • It would be helpful if someone could work on getting some baseline measurements together by building release versions of Android and iOS apps
  • Update the description with where we are at on those platforms
  • Land on some kind of realistic target together for how fast the app should open

@JmillsExpensify JmillsExpensify added the AutoAssignerTriage Auto assign issues for triage to an available triage team member label Oct 18, 2022
@melvin-bot
Copy link

melvin-bot bot commented Oct 18, 2022

Triggered auto assignment to @abekkala (AutoAssignerTriage), see https://stackoverflow.com/c/expensify/questions/4749 for more details.

@melvin-bot melvin-bot bot added Daily KSv2 and removed AutoAssignerTriage Auto assign issues for triage to an available triage team member Weekly KSv2 labels Oct 18, 2022
@roryabraham
Copy link
Contributor

roryabraham commented Oct 18, 2022

Just throwing this out there ... we have this issue to enable the new architecture. Part of that architecture, maybe less-talked-about than the new Fabric renderer, is called TurboModules. With the traditional React Native architecture all native modules had to be loaded up-front before the app would load and the JavaScript would start running. With TurboModules, native modules are lazy-loaded – that should bring forth some "free" reductions in TTI on cold boots

@mountiny
Copy link
Contributor

@roryabraham I feel like this issue live kind of in vacuum. Should we either make this a tracking issue for PRs/issues which are aimed to improve the app loading time or I think we might want to close this one out as Margelo already addressed some boot up changes and we might want to go in a direction of creating a specific issues for specific problems (similar as you have linked above - enable fabric to speed things up)

This issue is from august 2021 so I feel like it is a bit out of context at the moment. I think we could close this out, if you agree, feel free to close this one out.

@puneetlath puneetlath added the Bug Something is broken. Auto assigns a BugZero manager. label Oct 19, 2022
@JmillsExpensify
Copy link

@mountiny Agreed. I'll go ahead and link this one to the existing tracking issue.

@roryabraham
Copy link
Contributor

I agree with @mountiny here. We should close this out and focus on items with clearer deliverables. As @mountiny stated, this issue is old and our boot time has come a long way since its creation. Furthermore, Improve Android Cold Boot TTI is a moving target

@marcaaron
Copy link
Contributor Author

Idea for clearer deliverable ->

Land on some kind of realistic target together for how fast the app should open

Part of the problem with this issue is it's not clear how much faster things should be and an audit of where things are at would be a good place to start.

@roryabraham
Copy link
Contributor

I agree @marcaaron, and think that this issue could be a great starting place.

@kidroca
Copy link
Contributor

kidroca commented Oct 26, 2022

I think we talked about multiget on a few occasions but we never really gave it a chance and it might boost performance

A brief summary of the idea is

  1. We pretty much read all (or almost all) Onyx finite ("small") keys in memory at launch (or soon after)
  2. Instead of creating parallel read calls for each such key - we can request them with one multiGet operation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 Engineering Internal Requires API changes or must be handled by Expensify staff
Projects
None yet
Development

No branches or pull requests