Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD for payment 2024-05-06] [Tracking] Metric - Search screen load time (Performance audit by Callstack) #34323

Closed
adhorodyski opened this issue Jan 11, 2024 · 40 comments
Assignees
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Engineering Improvement Item broken or needs improvement. Weekly KSv2

Comments

@adhorodyski
Copy link
Contributor

Tracking issue for the Search screen load time (ms) metric as part of the Performance audit conducted by Callstack.

Part of #33070

@adhorodyski
Copy link
Contributor Author

adhorodyski commented Jan 11, 2024

Metric: Search screen load time (ms)
Phase: Measurements
Commit hash: 320ff54

Workflow

  • open up search screen
  • results list is fully rendered and interactive

Markers

  • open_search: duration between clicking a Search icon and rendering the list in Search page
  • load_search_options: duration of execution the function doing heavy calculations when rendering the list
open_search (ms) load_search_options (ms, avg)
11953.744924001396 3502.3185390010476
12051.73607699573 3480.1985640004277
11719.576307997108 3429.6390256658196
11811.395076997578 3462.9238203316927
11707.00953900069 3428.0835130016007
11849.601617000997 3474.501180000603
11582.046384997666 3387.7818203369775
11760.494769997895 3411.4773593346276
11760.494769997895 3411.4773593346276
11804.747692994773 3460.037820664545

These 10 runs resulted in an average of 11800ms on the open_search measurement.
Additionally, we were able to measure getOptions() calls (load_search_options) as a sidemetric - averaging with 3 calls, each of 3444ms per each run.

Please see this commit for the open_search markers.
Please see this commit for the load_search_options markers.

We advise to merge such markers placement into the main branch to start tracking this metric (open_search) over time with the Performance API.

@TMisiukiewicz
Copy link
Contributor

TMisiukiewicz commented Jan 11, 2024

Metric: Search screen load time

Phase: Analysis

Commit hash: 320ff54

Analysis

Account data Amount
Reports 15378
Personal details 39
  1. getOptions function is called excessively in the SearchPage component. Opening the page results with calling this function 3 times. Based on an analysis of the performance data, each one of the calls is made in different time, with average of 3,5s execution time. Let’s see the detailed data for Measurement 1:
Run number open_search_start open_search_end load_search_options_start load_search_options_end
1 62324497.106713 62336450.851637 62324633.911713 62328011.539021
2 - - 62328414.127329 62332004.100022
3 - - 62332910.445252 62336449.800868

image

The timeline indicates that getOptions method calls are filling most of the time between open_search_start and open_search_end. If possible, it should be called only once, since as you can see, the average time of getOptions execution is around 3500ms.

  1. Based on data from point 1., with total open_search of 11953ms, getOptions fills almost whole time required to render the Search page. Let’s breakdown this function to identify what is possibly causing this:
  • it takes all the reports and personalDetails as parameters. For the Callstack’s heavy account, it’s 15378 reports and 39 personal details
  • it can be configured by additional options, like betas, reportActions, the results can be modified by using additional flags like includeThreads, includeTasks etc, which may have impact on the final time of execution of the function
  • function is looping through all the available reports multiple times: first, filtering out all the reports that shouldn’t be displayed, then sorts reports by most recent ones, reverses the array and then loops through all the reports
  • looping through all the reports takes most of the time of the whole function execution. The most expensive function that is called on each report and personal detail is createOption. For web development build, removing createOption calls reduces the getOptions execution time from ~760ms to 40ms.
    Here are some functions used inside createOption that may slow it down:
  • getSearchText - responsible for looping through personal details list, building search terms and lastly, looping through all search terms to remove the repeating ones. It’s also using splitting and replacing with regexes, which might degrade performance. Removing this function from createOption speeds it up by 300ms on web for 15k reports. It is called for each report, each time getOptions is called, so currently it builds search terms from scratch not only when the screen is opened, but each time user is typing something in the search
  • isChatThread, isDefaultRoom, isPolicyExpenseChat, isExpenseReport, isArchived are called in the createOption and called again in getChatRoomSubtitle, we can pass those values from top to bottom to avoid excessive calls
  • getReportName is getting called in both createOptions and getSearchText. As the logic of this function looks complex, it could be called just once for each report in createOptions and passed to getSearchText as a parameter
  • getChatRoomSubtitle is also called directly in createOption for each report, and then again in getSearchText. We could pass the result of it’s value as a function parameter, to avoid calling it multiple times.

The biggest problem of getOptions function is the fact that it does not scale. The bigger input is, the slower the function gets. To optimize the rendering phase of the list, it was already paginated and limited to rendering only 500 initial items. Additionally, to avoid doing heavy calculations while animating, the function execution was moved after transition end.

Possible solution

To render only 500 items, it still needs to build all the options. At this moment, the option list is created from scratch on each Search Page opening. Since we already know that removing createOption for each report and personal detail reduces the getOptions execution time on web from ~760 to 40ms, we can think of a possible solution where all required options are generated only once (and whenever reports, personalDetails, betas change) and the only thing done when opening the Search page is filtering based on options passed to getOptions. This would affect not only the Search page, but all the lists utilizing OptionsSelector component.

For testing purposes, I’ve created a simple PoC where I did the following things:

  • created OptionsListContext where I am generating arrays of all possible options for reports and personal details using createOption. Once options are created, they are stored in the context.
  • whole app is wrapped with this context, loading it on the app start
  • screens utilizing OptionsSelector component are getting all the options available immediately (since they are already in the context), and they are filtering them using the appropriate function with all the options needed.

With this approach, when I was testing it on a release build on Android emulator, I was able to reduce the open_search time from average ~5500ms to ~42ms (from pressing the search icon to render the list).

The possible drawback of this solution is increased TTI of the app. Since the heavy calculations are moved to the app startup, it might extend the time needed for the app to run. For now the exact numbers are not known, however we did a rough tests on Android device to compare current state of the app, and after moving generating options to app start:

  Before (ms) After (ms)
Measurement 1 25355 36178
Measurement 2 31483 41569
Measurement 3 33236 28406
Measurement 4 31115 37962
Measurement 5 36735 27889
Measurement 6 30828 40791
Measurement 7 31929 29622
Measurement 8 28547 42382
Measurement 9 28547 28481
Measurement 10 25451 29348
Average 30322.6 34262.8

This indicates ~10% increase of TTI for Android app. However, since it’s a rough calculation, and there are no improvements applied to getOptions function, this values may change, it might also vary depending on a platform and size of the input on the account. The results above indicates a wide area of TTI for a single Android device, with 27889ms lowest and 42382ms highest result. Additionally, we have a separate metric for app TTI, so we could try to find potential bottlenecks during this process and possibly mitigate its impact on app startup.

We should also try to refactor some parts of createOption mentioned above and see if there are any possibilities of refactoring that may improve performance. I would suggest to focus on:

  • exploring if there is any more performant way of building search. Since getSearchText is taking 300 out of 700 ms on web, we might look for other ways of building it, e.g. with match-sorter maybe.
  • dive into getReportName and all the nested functions. It seems like removing it is also speeding things up by ~100-150ms. Additionally this function is used again in createOption to get name for each thread, so the amount of calls of this function might be even bigger

Both of these functions seem to work just fine when the amount of data is not big, or in a single run, but the combined time when running it against 15k reports seems to be big. I believe there are accounts with even more data than our testing account.

I’d suggest starting with trying to refactor the search and diving into getReportName. I did a simple PoC where i removed the getSearchText from createOption. Instead, i used match-sorter when typing something in the search input. Additionally i reduced the amount of getOptions calls from 3 to 1. My initial results for Web environment:

Metric Before After
getOptions execution time ~730ms ~415ms
getOptions function calls 3 on init + on each search function call 1 on init
updateOptions execution time (function called when typing in the search input, responsible for calculating the options to show) ~730ms ~110ms

If we analyze those results, they seem very promising. In this particular scenario (15k reports, web environment) there is a ~300ms improvement in getOptions execution time. However, the most important thing is, this is the only point when this function is called. When typing into search, it is not called anymore, and instead the match-sorter is used. It results with 7x faster response time which is a huge improvement. Of course, those values may change, as it is only PoC not covering all the possible cases. I’ve used it to filter recentReports and personalDetails:

const searchReports = matchSorter(searchOptions.recentReports, searchValue, {keys: ['text', 'subtitle', 'alternateText', 'participantsList.0.displayName', 'participantsList.0.firstName', 'participantsList.0.lastName', 'participantsList.0.login']})
const searchPersonalDetails = matchSorter(searchOptions.personalDetails, searchValue, {keys: ['login', 'displayName']})

This solution should offload the createOption and speed up the initial load of the search screen. I tried to compare the outputs for different phrases, and sometimes it differs by a few elements on the list, so I assume this solution just needs a little bit more polishing e.g. removing duplicates from array if they exist.

If the performance improvement is proven by numbers in this case, we can try to separate createOption function from getOptions and move it to app startup, to generate it only once and then share all options across all search pages. We should measure the impact on the TTI carefully with this one, to assess whether this change does not have a negative impact on app startup.

@mountiny
Copy link
Contributor

thats quite a while, at least that will be good to see any improvements 🚀

@roryabraham
Copy link
Contributor

We advise to merge such markers placement into the main branch to start tracking this metric (openSearchPage) over time with the Performance API.

@adhorodyski @TMisiukiewicz can you please open a separate PR to start tracking these metrics?

@TMisiukiewicz
Copy link
Contributor

@roryabraham here it is: #34489. I have updated the naming to match the app convention:
openSearchPage -> open_search
GetOptions -> load_search_options

I've also changed the place where the load_search_optionsmarker is called so we could measure it separately for each search screen if there is any need to do that. This change does not affect the results from the measurements phase.

@adhorodyski can we update those namings in both Measurement and Analysis phases?

@melvin-bot melvin-bot bot added Reviewing Has a PR in review Weekly KSv2 and removed Weekly KSv2 labels Jan 15, 2024
@TMisiukiewicz
Copy link
Contributor

@roryabraham Can I have your opinion on the above analysis and proposed changes?

@adhorodyski
Copy link
Contributor Author

Hi @mountiny @roryabraham , would you guys be able to jump in here in the next couple of days to help Tomek with the review process? I think we can also push #34489 forward with the new markers setup.

@mountiny
Copy link
Contributor

reviewed the PR

@adhorodyski
Copy link
Contributor Author

@mountiny do you feel like reviewing the analysis or should we bring other folks for this matter?

@mountiny
Copy link
Contributor

Sorry @adhorodyski I have missed this and thought it was alreday reviewed by Rory

First of all, wow, what a great analysis ❤️

I think that report pagination will help us get the number of reports in the app to more maintainable numbers but its not removing the inefficiencies you have found. Also I think the number of report actions, policies and transactions can also influence this as some of the pieces in the created options can still subscribe to those too. Also your testing account does not have that many personal details as really user might have which I think could actually make the stuff faster than it would be for normal user.

I believe using context to cache the options is a great idea. But I do not like the idea of doing all the heavy lifting when opening the App. Could we keep that options creation when the Search page is opened for a first time leading to slower opening time the first time but then stuff will be cached and we can load next time faster?

In terms of the specific improvements in the createOption method, I would not block any of this, I think we have been just adding more and more stuff to this structure without thinking that much about its performance so if we can take a fresh look whether this cannot be done much better now knowing where we are and what expectations we have from this page/ component would be handy.

I think you can bring any of the specific smaller proposals to the newdot-performance room and we can take this peacemeal instead of one huge catch all project/ PR

I think any more bigger changes of how the search will be handled with come with report pagination where we will not have all the reports available offline and once we will work on global search but that is different story

@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Weekly KSv2 labels Jan 29, 2024
@melvin-bot melvin-bot bot changed the title [Tracking] Metric - Search screen load time (Performance audit by Callstack) [HOLD for payment 2024-02-05] [Tracking] Metric - Search screen load time (Performance audit by Callstack) Jan 29, 2024
Copy link

melvin-bot bot commented Jan 29, 2024

Reviewing label has been removed, please complete the "BugZero Checklist".

@melvin-bot melvin-bot bot removed the Reviewing Has a PR in review label Jan 29, 2024
Copy link

melvin-bot bot commented Jan 29, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.4.32-5 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-02-05. 🎊

For reference, here are some details about the assignees on this issue:

@TMisiukiewicz
Copy link
Contributor

Following the @mountiny suggestions, I have a plan to split things into smaller pieces and create two proposals to share on newdot-performance channel:

  • implementing match-sorter for search
  • caching options using context when opening the Search page for the 1st time

Those two things are something that will definitely speed up load time of all search screens, and it will also speed up the search mechanism itself. I should start working on both proposals somewhere this week. I'd treat rest of the findings optional for now, as they might not bring any big value in terms of performance now - we can easily verify it once the key changes are implemented.

@mountiny
Copy link
Contributor

@TMisiukiewicz Thank you very much! Really appreciate it. Please try to include some benchmarks in the proposals to highlight the benefits of these changes

@melvin-bot melvin-bot bot removed the Overdue label Mar 4, 2024
@roryabraham
Copy link
Contributor

thanks, I think we can make this a weekly

@roryabraham roryabraham added Weekly KSv2 and removed Daily KSv2 labels Mar 6, 2024
@melvin-bot melvin-bot bot added the Overdue label Mar 14, 2024
@roryabraham
Copy link
Contributor

Review ongoing in #37909

@melvin-bot melvin-bot bot removed the Overdue label Mar 21, 2024
@melvin-bot melvin-bot bot added the Overdue label Apr 1, 2024
@melvin-bot melvin-bot bot added Reviewing Has a PR in review Weekly KSv2 and removed Weekly KSv2 Overdue labels Apr 2, 2024
@melvin-bot melvin-bot bot added Monthly KSv2 and removed Weekly KSv2 labels Apr 25, 2024
Copy link

melvin-bot bot commented Apr 25, 2024

This issue has not been updated in over 15 days. @TMisiukiewicz, @adhorodyski, @roryabraham eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!

@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Monthly KSv2 labels Apr 29, 2024
@melvin-bot melvin-bot bot changed the title [Tracking] Metric - Search screen load time (Performance audit by Callstack) [HOLD for payment 2024-05-06] [Tracking] Metric - Search screen load time (Performance audit by Callstack) Apr 29, 2024
@melvin-bot melvin-bot bot removed the Reviewing Has a PR in review label Apr 29, 2024
Copy link

melvin-bot bot commented Apr 29, 2024

Reviewing label has been removed, please complete the "BugZero Checklist".

Copy link

melvin-bot bot commented Apr 29, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.4.67-7 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-05-06. 🎊

For reference, here are some details about the assignees on this issue:

@parasharrajat
Copy link
Member

@muttmuure Am I eligible for compensation for PR review?

@mallenexpensify
Copy link
Contributor

mallenexpensify commented May 10, 2024

Contributor+: @parasharrajat due $500 via NewDot

@parasharrajat
Copy link
Member

As per this thread expensify.slack.com/archives/C02NK2DQWUX/p1715272276516799?thread_ts=1715190852.677769&cid=C02NK2DQWUX, the price should be 500 for this issue. @mallenexpensify

Could you please update that on the summary?

@mallenexpensify
Copy link
Contributor

Actually, that's not necessarily accurate, that the 'creation' date is the source of truth. If a price was changed before you were assigned, it should from that date, right? In this case, it looks like @parasharrajat was assigned to review the PR on April 2nd. Which... is 2 days before we lowered the price to $250 so... yes.. you're right @parasharrajat, payment is $500 for you, updating the above. :)

I want to think more about how to best document this....

@parasharrajat
Copy link
Member

Payment requested as per #34323 (comment)

@JmillsExpensify
Copy link

$500 approved for @parasharrajat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Engineering Improvement Item broken or needs improvement. Weekly KSv2
Projects
Development

No branches or pull requests

8 participants