Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] JSI Storage implementation POC #4492

Closed
kidroca opened this issue Aug 7, 2021 · 34 comments
Closed

[Performance] JSI Storage implementation POC #4492

kidroca opened this issue Aug 7, 2021 · 34 comments

Comments

@kidroca
Copy link
Contributor

kidroca commented Aug 7, 2021

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


This issue was created as a result of the conversation here: https://expensify.slack.com/archives/C01GTK53T8Q/p1628181643026200?thread_ts=1628106063.148700&cid=C01GTK53T8Q

What performance issue do we need to solve?

  • storage read/write times
  • React native bridge concerns

What is the impact of this on end-users?

  • app boot time
  • chat switch times

List any benchmarks that show the severity of the issue

Setup

Flows

Init - relaunch the app and use Onyx.printMetrics after everything is rendered
Chat switch - go back to LHN, use Onyx.resetMetrics now open a different Chat. Use Onyx.printMetrics after everything is rendered

Results Android (dev)

Flow init chat switch
Onyx Total 70.73sec 12.46sec
App Render total 7.02sec 4.56sec
App last render at 20.45sec 10.06sec
Onyx:get total 39.80sec 7.02sec
Onyx:get avg 205.934ms 292.693ms
Onyx:get last call at 15.41sec 3.74sec
Onyx:set total 8.74sec 0.84sec
Onyx:set last call at 15.80sec 3.74sec

Proposed solution (if any)

Create a Proof Of Concept by switching Onyx to use a JSI storage library like

1. Extract a storage provider interface from the existing AsyncStorage usages in Onyx

Onyx interface with the native storage library is pretty simple get, set ,merge etc
We can make Onyx work with different providers of this interface

  • use Tencent on Android (and maybe iOS)
  • fallback to AsyncStorage for everything else

2. Work with different storage providers

To make different storage providers like AsyncStorage or MMKV work with Onyx we make a thin wrapper
implementing the storage interface in provider.native.js or provider.web.js - whatever is needed
Then in Onyx we import provider from './provider' and use the methods from the interface - get, set, merge

List any benchmarks after implementing the changes to show impacts of the proposed solution (if any)

Note: These should be the same as the benchmarks collected before any changes.

Flow init chat switch
Onyx Total 50.16sec 13.49sec
App Render total 6.89sec 5.11sec
App last render at 20.24sec 10.97sec
Onyx:get total 32.15sec 7.82sec
Onyx:get avg 166.902ms 325.948ms
Onyx:get last call at 15.54sec 3.80sec
Onyx:set total 3.92sec 0.81sec
Onyx:set last call at 16.01sec 3.82sec

Platform:

Where is this issue occurring?

  • Web
  • iOS
  • Android
  • Desktop App
  • Mobile Web

Version Number: 1.0.83-0
Logs: https://stackoverflow.com/c/expensify/questions/4856
Notes/Photos/Videos: Any additional supporting documentation
Expensify/Expensify Issue URL:

View all open jobs on Upwork

@MelvinBot
Copy link

Triggered auto assignment to @yuwenmemon (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

@quinthar
Copy link
Contributor

quinthar commented Aug 7, 2021 via email

@marcaaron
Copy link
Contributor

To make different storage providers like AsyncStorage or MMKV work with Onyx we make a thin wrapper implementing the storage interface in provider.native.js or provider.web.js - whatever is needed. Then in Onyx we import provider from './provider' and use the methods from the interface - get, set, merge

Not to be a party pooper, but this feels a bit complicated. Can we maybe try to do this the simplest way possible and create a version of AsyncStorage that uses MMKV instead of calling the native modules?

@kidroca
Copy link
Contributor Author

kidroca commented Aug 9, 2021

I think you should just take a look at the PR: Expensify/react-native-onyx#95

Picking a provider like this

storage/index.js
// index.native uses MMKV_Provider
import StorageProvider from './providers/AsyncStorageProvider';

const instance = new StorageProvider();

export default instance;

And then in Onyx.js use StorageProvider.get, StorageProvider.set and the rest of the methods is much simpler than modifying AsyncStorage internals.

Can we maybe try to do this the simplest way possible and create a version of AsyncStorage that uses MMKV instead of calling the native modules?

How is that the simplest way?
Creating a version of AsyncStorage that uses MMKV and JSI is a rewrite from scratch, and there is already 2 libraries that provide it

Implementing my proposal allows switching to Onyx to any storage library in a matter of minutes

@marcaaron
Copy link
Contributor

marcaaron commented Aug 9, 2021

If the point of this exercise is to prove that MMKV is valuable then we don't need the provider stuff for that. We could have just made MMKV_Provider.js a local replacement for AsyncStorage.

While we're on the subject though... how do I implement this to help test? There are no instructions on the linked PR.

Edit: Just seeing the comment now (maybe add it to the description?)

@kidroca
Copy link
Contributor Author

kidroca commented Aug 10, 2021

We've captured benchmark data, did manual tests and it seems we're not getting any benefit from the JSI MMKV storage implementation ATM

In terms of "time to interactive" and "chat switch time" there's no obvious improvement

What we can continue with:

  • verify that the MMKV JSI implementation is faster than AsyncStorage [Performance] JSI Storage implementation react-native-onyx#95 (comment). We need to decide how to test this.
  • try a different MMKV JSI storage implementation (Very likely to have the same result)
  • revise with the requested changes in the PR, but perhaps save it in a different branch for reference or future tests with other storages
  • something else or just close ?

@marcaaron
Copy link
Contributor

marcaaron commented Aug 11, 2021

Does everyone agree that it feels like we are missing a bottleneck that has nothing to do with storage and until we solve that we won't see the benefits of *improved storage?

Proving that there is some benefit in isolation seems like it will tell us that things could be better, but we still need to fix "something" else about our Onyx design.

I'm not 100% sure what @quinthar had in mind here but it sounds like maybe we want to benchmark various parts of Onyx in isolation with different scenarios to see if there are any places where MMKV pulls out ahead of AsyncStorage and where there's no difference at all. And maybe in the latter cases we can rethink the design so that it can take advantage if MMKV's "faster storage".

@quinthar
Copy link
Contributor

quinthar commented Aug 11, 2021 via email

@kidroca
Copy link
Contributor Author

kidroca commented Aug 11, 2021

Does everyone agree that it feels like we are missing a bottleneck that has nothing to do with storage and until we solve that we won't see the benefits of *improved storage?

Yes, We've accumulated multiple clues about this. The problem doesn't seem to be storage access speed, but:

  1. relaying storage data and updates to React
  2. too much scripting (main thread time) spent by Onyx
  3. Something else that chokes that app and everything going on Android

1. Relaying updates

We have plans like fill initial state sync from cache
One unexplored path ATM is "too many HOCs" could be causing an issue - each HOCs adds additional component and React lifecycle, it's own state slice (copy) etc...

2. Too much scripting

The infrastructure that Onyx creates to manage the pub/sub might be taking too much main thread time or doing something that causes React to be slow.
We've already discovered and addressed the issue with "too many connections for the same key", but another thing that happens often is the keyChanged call (might actually be the most frequently called func) - it iterates the full list of connections instead of just the connections for the key that changed - that seems so plausible and unnecessary that I'll just open a ticket

3. Something else

The JSI storage had pretty much the same 2+ sec calls on the same places where AsyncStorage had those, but MMKV and JSI work differently
I think the "init" pattern we use might be too eager to load as much as possible in parallel and everything is getting in the way of everything else

Let's start with a bare-bones benchmark that proves Tencent is faster than AsyncStorage in anything

Ok I'm down for that, but it's still undecided how exactly we should go about this, I've already proposed the standalone app, but no one commented on that and so it's probably not the way to go

bit by bit add Onyx step by step until we figure out where Onyx introduces the bottleneck.

If Onyx is used in a isolated environment - at a small scale - it does not exhibit any problems.
Examples are Login/Register - on these screens the rest of the app is not loaded at all and they load fast and storage calls are super fast as well

One of my past proposals was to strip everything from the Authenticated app and start to put it back piece by piece and see how components affect TTI and interactions

@kidroca
Copy link
Contributor Author

kidroca commented Aug 11, 2021

The only thing related to the current ticket is standalone benchmarking MMKV
BTW there's already a benchmark made by the authors: https://github.com/mrousavy/react-native-mmkv#benchmark
benchmark

@marcaaron
Copy link
Contributor

The infrastructure that Onyx creates to manage the pub/sub might be taking too much main thread time or doing something that causes React to be slow.

Just speculating - but I wonder if the Onyx.connect() used outside of React components could be causing issues? Too many subscriptions in general could be affecting performance (pointless callbacks stored in memory, updating values for no clear reason, etc).

At the moment, a key change updates a value immediately instead of when it's needed. There are probably numerous places (mainly in the non React code but also in React components) where a store getter could replace a subscriber that updates a local variable. e.g. if we were working with redux there are cases for store.subscribe() and cases for store.getState() respectively.

Not too sure how to go about testing this. I think it might require an audit of all Onyx.connect() and many changes before we'd have any idea if it helps.

it iterates the full list of connections instead of just the connections for the key that changed - that seems so plausible and unnecessary that I'll just open a ticket

I agree this does seem unnecessary. I'd also kind of expect it would happen very fast unless there were thousands of connections, but not too sure.

I think the "init" pattern we use might be too eager to load as much as possible in parallel and everything is getting in the way of everything else

This feels really likely. Especially since we are not deferring any writes either. We try to read, write, and display UI all at once when the app inits. I wonder if it would help to take a less aggressive strategy when it comes to writing to storage? We've looked at batching reads - but what about deferring any writes until we're done reading and building critical UI? Seems like the work of writing could be batched/throttled by some amount and perhaps that will free the main thread up to do other things?

@kidroca
Copy link
Contributor Author

kidroca commented Aug 11, 2021

I've captured more data regarding withOnyx and I think an easy way to measure how much would initialising state from cache would help. I'm a bit lost if we already have a ticket about this change though, should I create one or there's an existing issue?

@marcaaron
Copy link
Contributor

Some conversation has been happening here. If you want to create a new ticket with what you are thinking we can link it to that one.

@marcaaron
Copy link
Contributor

Did a bit of basic testing with MMKV. It seems quite a bit faster than AsyncStorage for simple operations but only somewhat better (100ms or so improvement) for when you need to do a lot of stuff (e.g. read and write a ton of keys at once like we do when the app inits).

Here's the rough test I tried... I exported my entire localStorage to JSON and then tested how long it would take to write the entire contents w/ MMKV.set(), AsyncStorage (setItem + Promise), and AsyncStorage.multiSet() then performed a similar test for reading.

The results were kind of surprising in that while MMKV outperformed AsyncStorage in just about every category but it really really outperformed it in the case where many individual keys are set with AsyncStorage.setItem(). e.g. using AsyncStorage.setItem() to save each key took 6 seconds in total whereas MMKV performed the same task in 300ms.

Not sure how relevant this is for us since we are using multiMerge/multiSet via mergeCollection() but seemed interesting.

@MelvinBot
Copy link

@kidroca Eep! 4 days overdue now. Issues have feelings too...

@kidroca
Copy link
Contributor Author

kidroca commented Aug 18, 2021

Yes, I've also observed that writing with AsyncStorage tends to be slower, merge and multiset as well
Something to note is that even though a write can be taking more time, the call is not blocking the UI

Something to consider with MMKV are the sync methods - these block the main thread, though it might not be an issue as we're writing to memory

A pro with MMKV is we would be able to drop some or all of the caching logic

  • we would no longer need to capture tasks as calls resolve immediately
  • we might not capture cache for the same reason, though then we'd need to JSON.parse every time we read a key

We've already discovered and addressed the issue with "too many connections for the same key", but another thing that happens often is the keyChanged call (might actually be the most frequently called func) - it iterates the full list of connections instead of just the connections for the key that changed - that seems so plausible and unnecessary that I'll just open a ticket

I've tracked keyChanged calls during app init and report switch
It turned out they aren't happening as often as I thought - about 20-25 for init and 8-10 for chat switching
The ones during init averaged a slower time of about 16ms per call, there's always 1 slow call of about 150ms (on a DEV build though), while for chat switches time never raised above 5ms

I've applied refactoring, where we store connections mapped by key, which allows to iterate less items, but it didn't seem to help much

I've also explored whether the setState calls in key(s)Changed aren't delaying things as they use the callback version, but they aren't


Yep, that's exactly right. Let's start with a bare-bones benchmark that proves Tencent is faster than AsyncStorage in anything, and then bit by bit add Onyx step by step until we figure out where Onyx introduces the bottleneck.

So far it looks like we know MMKV is faster general, but Onyx with MMKV didn't result in any noticeable improvement
If Onyx has a problem it's not tied to the underlying storage implementation but how it's used and how updates are translated to React

One thing left is to explore is rewriting withOnyx to use an aggregated storage provider
Another option is identifying other bottlenecks during app init

@MelvinBot MelvinBot removed the Overdue label Aug 18, 2021
@quinthar
Copy link
Contributor

quinthar commented Aug 19, 2021 via email

@kidroca
Copy link
Contributor Author

kidroca commented Aug 19, 2021

This is pretty much the cache object that we currently have
Disabling cache clearing would resolve every request from cache and never go through the bridge
I've tried this in the past, there wasn't much improvement during init - the improvement is only for the chat switches happening after (we've added LRU cache which has this improvement and allows clearing cache)
To make this exactly as what you suggest we can also disable writing to storage and just interact with cache

This experiment we would need to initialize cache from storage. And this would fall to the batching and the multi-get category optimizations that we wanted to avoid in the first place (for the current issue scope)
We need to read everything from storage and put it in cache as otherwise the app would start in a logged out state and would also have to fetch everything from the network

I don't think the experiment would reveal anything that we don't already know.
There are slow sync calls taking between 200ms to 1000ms that are not interacting with storage at all, it seems to be the result of the main (the JS) thread getting blocked as it consistently happens during init

Kind of crazy idea, what if we...

Kind of crazy idea, what if we take the opposite approach and make a fake storage engine that is entirely in javascript, avoiding the bridge totally. It would just have a large Json object that it is initialized with, and then when requests are made to it it just looks up in response with that. This would eliminate the bridge and the storage engine entirely from testing, and would simply evaluate the overhead of Onyx itself, and all of the promises and so forth. It could be that all of the time lost is actually just in the use of our promises, react components, etc.

@kidroca
Copy link
Contributor Author

kidroca commented Aug 19, 2021

We've made some improvements to App and Onyx and we've also have a new more consistent benchmark process so it might be worth it to redo the JSI benchmark

@kidroca
Copy link
Contributor Author

kidroca commented Aug 24, 2021

Nothing new to report here, I need some internal feedback to proceed

@MelvinBot
Copy link

@kidroca Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

@MelvinBot
Copy link

@kidroca 6 days overdue. This is scarier than being forced to listen to Vogon poetry!

@kidroca
Copy link
Contributor Author

kidroca commented Aug 31, 2021

@marcaaron
In not sure how to continue here,
We've made the POC, had some observations, tried different items.
At the time the POC didn't bring much improvement
JSI storage can help in other areas like dropping cache altogether as values can be read directly from storage

In the mean time I've worked on file handling in Onyx which uses the same storage provider concept to deliver separate handling between native and web - that work will allow us to add the MMKV storage whenever we want if we decide to

@MelvinBot MelvinBot removed the Overdue label Aug 31, 2021
@marcaaron
Copy link
Contributor

JSI storage can help in other areas like dropping cache altogether as values can be read directly from storage

I like the sound of this. Having a synchronous I/O sounds neat, but also seems like a big change to Onyx for unestablished benefit. Feel free to get more other opinions by pitching these ideas in the Slack channel. Perhaps some others will have ideas on whether this issue is worth keeping open or how to proceed.

@MelvinBot
Copy link

@kidroca Eep! 4 days overdue now. Issues have feelings too...

@MelvinBot
Copy link

@kidroca 6 days overdue. This is scarier than being forced to listen to Vogon poetry!

@MelvinBot
Copy link

@kidroca 8 days overdue is a lot. Should this be a Weekly issue? If so, feel free to change it!

@marcaaron marcaaron added Weekly KSv2 and removed Daily KSv2 labels Sep 10, 2021
@MelvinBot MelvinBot removed the Overdue label Sep 10, 2021
@marcaaron
Copy link
Contributor

Dropping this to weekly as it doesn't seem like there is a clear consensus how to proceed or close.

@kidroca
Copy link
Contributor Author

kidroca commented Sep 11, 2021

I've posted on slack about this ticket, but it didn't get any attention.
I'll try again next week

@quinthar
Copy link
Contributor

quinthar commented Sep 11, 2021 via email

@kidroca
Copy link
Contributor Author

kidroca commented Sep 13, 2021

I thought the idea was to test how much MMKV and JSI would improve things.
In terms of TTI there were no improvements, no noticeable improvement for chat switches as well.
So we started to look elsewhere
Since the benchmark here, there were some updates that improved the app as you've noticed, IMO the ticket can continue to explore another storage library or re-run the benchmarks to see if something changed on that front

The work done here is handy as it allows us to easily swap and try different storage solutions and we're already applying the concept for file handling #3867.
We can relatively easy try different native storage libraries if we want to explore that path.


I thought the bottleneck was storage and particularly the react-native bridge, but even when we skipped it entirely the performance is just slightly faster.

Overall my opinion is that the solution to Android performance is not in the storage library that we use. We have other paths to explore let's do that and come back here


It's seems we don't have a ticket dedicated on Android performance, we have a few [Performance] tickets that cover all platforms (e.g. multiget and batching).
We can open a separate ticket for Android, but wouldn't it be better to handle "leads" in slack and then open specific tickets (Pretty much like how this ticket came to existence)

I can go thought the suggestions that we didn't try here and post them on slack so we can create tickets for the ones that do pickup

...

Yes can you please bump that thread? Android performance is light years
beyond what it was before, which was entirely unusable. But it's still not
necessarily fast.

@quinthar
Copy link
Contributor

quinthar commented Sep 13, 2021 via email

@MelvinBot MelvinBot added Monthly KSv2 and removed Weekly KSv2 labels Oct 6, 2021
@MelvinBot
Copy link

This issue has not been updated in over 15 days. @kidroca eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!

@MelvinBot
Copy link

@kidroca, this Monthly task hasn't been acted upon in 6 weeks; closing.

If you disagree, feel encouraged to reopen it -- but pick your least important issue to close instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants