Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle pagination errors in chats #40610

Merged
merged 4 commits into from
May 13, 2024

Conversation

janicduplessis
Copy link
Contributor

@janicduplessis janicduplessis commented Apr 19, 2024

Details

Currently we do not handle errors properly in getOlderActions and getNewerActions. All we do is set the loading state back to false, but that will cause the list onEndReached or onStartReached to be called again and a new network request to load more will be triggered. If we are near any edge of the chat, or even worse if we have a chat with few messages we will trigger these requests in a loop.

To solve this we should add an error state to those pagination methods and display an error UI with an option to retry loading. This will prevent this request loop from happening.

Here's the UI I implemented for this error state

Top error:

image

Bottom error:

image

Example with comment linking:

Screen.Recording.2024-04-19.at.16.39.20.mov

Fixed Issues

$ #40641
PROPOSAL:

Tests

  • Verify that no errors appear in the JS console
  • Test error state by failing pagination requests adding the following code in HttpUtils.ts here
if (url.includes('GetNewerActions') || url.includes('GetOlderActions')) {
    return {jsonCode: CONST.JSON_CODE.EXP_ERROR};
}
  • Test comment linking

Offline tests

QA Steps

  • Verify that no errors appear in the JS console
  • Test error state by failing pagination requests adding the following code in HttpUtils.ts here. It might also be possible to simulate this by turning off internet after loading a chat.
if (url.includes('GetNewerActions') || url.includes('GetOlderActions')) {
    return {jsonCode: CONST.JSON_CODE.EXP_ERROR};
}
  • Test comment linking
  1. Go in a long chat and scroll to an old message
  2. Right click and "Copy link"
  3. Go in another chat and paste the link
  4. Clear the cache (Account Settings -> Troubleshoot -> Clear cache and restart)
  5. Go back to the chat with the link and click it
  6. You will be in the middle of a chat and can test loading older / newer messages

PR Author Checklist

  • I linked the correct issue in the ### Fixed Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I added steps for the expected offline behavior in the Offline steps section
    • I added steps for Staging and/or Production testing in the QA steps section
    • I added steps to cover failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android: Native
    • Android: mWeb Chrome
    • iOS: Native
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
    • MacOS: Desktop
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that the left part of a conditional rendering a React component is a boolean and NOT a string, e.g. myBool && <MyComponent />.
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
      • If any non-english text was added/modified, I verified the translation was requested/reviewed in #expensify-open-source and it was approved by an internal Expensify engineer. Link to Slack message:
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.js or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.

Screenshots/Videos

Android: Native
Screen.Recording.2024-04-19.at.16.39.20.mov
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari image image
MacOS: Desktop

@janicduplessis janicduplessis requested review from a team as code owners April 19, 2024 19:54
@melvin-bot melvin-bot bot requested review from Gonals and removed request for a team April 19, 2024 19:54
Copy link

melvin-bot bot commented Apr 19, 2024

@Gonals Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]


const loadNewerChats = useCallback(() => {
if (isLoadingInitialReportActions || isLoadingOlderReportActions || network.isOffline || newestReportAction.pendingAction === CONST.RED_BRICK_ROAD_PENDING_ACTION.DELETE) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this was checking isLoadingOlderReportActions which I assume was an error since this is loadNewerChats, but maybe someone can confirm that this change is ok.

@janicduplessis
Copy link
Contributor Author

janicduplessis commented Apr 19, 2024

Is someone familiar with this code? I noticed the loading indicator for newer messages never seem to show even when comment linking in the middle of a chat that has new messages to load. This also causes the error state not to show since I'm implementing it in ListBoundaryLoader, which I think is the ideal place.

To test it I am currently bypassing the check.

@janicduplessis
Copy link
Contributor Author

@mountiny Do you know if there is an issue related to this already? Should I create one?

This is currently coming from this convo on slack.

@janicduplessis
Copy link
Contributor Author

Might also need help here with design and texts, not sure who to tag and I can't seem to add the Design label.

@mountiny
Copy link
Contributor

@janicduplessis I dont think there is one, feel free to create it please.

Then we can add design label there too to ensure the looks are as we expect

@janicduplessis
Copy link
Contributor Author

janicduplessis commented Apr 20, 2024

@mountiny Thanks! Done #40641

@mountiny
Copy link
Contributor

@dannymcclain would you be able to help with the design on this PR? thanks!

@dannymcclain
Copy link
Contributor

Going to cc @Expensify/design in here as well. How do you all feel about the Couldn't load UI?

A couple thoughts from me:

  • I think the message text should be text-supporting color
  • I think I would propose swapping the button for a spinner when you tap Try again instead of swapping out the whole message for the wall of skeleton loaders—it's kinda weird when the whole message disappears, a bunch of skeleton loaders show up, and then it comes back.

image

I realize this is unlikely to happen "in real life" as much as it during me testing this, but still, I think it's pretty weird. I would rather the Try again button just change to a spinner, and then if we still can't load, it changes back to the button. If we CAN load new messages, the new messages just replace the whole error message block.

CleanShot.2024-04-22.at.10.48.07.mp4

@janicduplessis
Copy link
Contributor Author

@dannymcclain Thanks, I like the suggestion, will look at making those changes this week.

@shawnborton
Copy link
Contributor

shawnborton commented Apr 22, 2024

Hmm before diving into the UI details, I would be curious to know what exactly causes this? Whatever it is, why can't we solve the root of the problem instead of dressing up the symptoms with nice UI?

That being said, I love everything Danny did. But I am just a bit confused why we would ever need to even show something like this or how it would happen to a real user. I've never seen this happen in any of my other chat apps.

@quinthar since you reported this issue - is this how you would expect to handle it?

@shawnborton
Copy link
Contributor

Maybe said another way, why exactly does the "Try again" button do to fix the pagination error? Why wouldn't we just do that automatically for the user instead of making them tap a button?

@janicduplessis
Copy link
Contributor Author

The network requests can fail for various reasons, whether it be a network issue, or some outage on our end. This happened recently to both @mountiny and @quinthar, with the current behaviour the requests are retried in a loop so this is why I suggested introducing an error ui. I’m pretty sure I’ve seen similar in other chat apps before.

@dannymcclain
Copy link
Contributor

with the current behaviour the requests are retried in a loop

@shawnborton This is shown in the attached video here. I've been seeing it a lot in the product too—it's super disorienting and there's seemingly no way to make it stop other than quitting the app and reopening it. So this Try again UI feels like a big improvement over those crazy flashing skeletons.

@shawnborton
Copy link
Contributor

Yeah that's fair, but why does the Try again button work and the automatic retry doesn't?

@janicduplessis
Copy link
Contributor Author

The main goal is to let the user know there is a problem, and avoid spamming our servers with network requests. Currently it just keeps flashing the loading skeleton while immediately retrying the request. If we show some ui it stops the loop and warn the use about the problem. They can decide to try again and if it fails again then it will show the error state again.

@janicduplessis
Copy link
Contributor Author

It would be possible to retry the requests a few times, but at some point we need to stop if it keeps failing and let the user know. That is the main goal of this change. If we want to add some automatic retry behaviour it should probably be implemented somewhere else though, more at the networking layer.

@shawnborton
Copy link
Contributor

So just to make sure I understand correctly, in this PR, if the request fails we'll show you a button to retry. But when you retry, it's likely just going to fail again?

@janicduplessis
Copy link
Contributor Author

Yes, depending on what the problem was, if it was a random network error it might succeed when retrying, but if it is an outage it probably won’t. At least the user will be warned that an error occurred instead of being stuck in a loading loop.

@janicduplessis
Copy link
Contributor Author

The current behaviour that causes a loading loop can also be very bad in case of an outage where all clients will start spamming the backend with network requests.

@shawnborton
Copy link
Contributor

Cool, thanks for explaining. I definitely don't want to block on progress here and agree that this is better than what we currently experience. Thanks for hearing me out!

@janicduplessis
Copy link
Contributor Author

There is definitely a good case for retrying failed requests automatically, but that would be a different project. It would be more like whenever a request fail we can try to see what the reason for the failure is and if we think it could work if trying again (let’s say its a network error) then we can re-send the request automatically. Those retries could be limited to only a few times and possibly with a delay. Then in case the retries fail it would throw an error which would be handled by the error states of the app (including the one added here).

@janicduplessis
Copy link
Contributor Author

If there are no further design comments I can go ahead and implement the improvements tomorrow and this can be ready for final review.

@Gonals
Copy link
Contributor

Gonals commented Apr 25, 2024

@janicduplessis, let me know when this is ready for review!

@rojiphil
Copy link
Contributor

rojiphil commented May 9, 2024

Addressed your review feedback too, let me know if it's good.

@janicduplessis Thanks for the changes. I just tested error condition for GetNewerActions and the error UI came up.
But we have another problem to address. When we retry GetNewerActions, the entire error message UI disappears for a moment and shows up again on failure of the API request. Like the GetOlderActions, shouldn't the error message Ui remain on retry until the API response is known? Here is a video to demonstrate this:

40610-new-action-retry-issue.mp4

@dannymcclain
Copy link
Contributor

@rojiphil good catch. I think they should behave the same like you're suggesting.

@janicduplessis
Copy link
Contributor Author

It currently disappears during loading because canShowHeader is false, which I think shouldn’t be. I’m a bit hesitant to add some workaround here since it seems like canShowHeader shouldn’t be false here and there might be a bug with it that could be addressed separately. What do you think?

Copy link
Contributor

@rojiphil rojiphil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit hesitant to add some workaround here

I agree that we should not implement any workaround around canShowHeader. But we may overcome the problem as mentioned in the review comments.
Would this work?

@@ -1081,6 +1085,7 @@ function getNewerActions(reportID: string, reportActionID: string) {
key: `${ONYXKEYS.COLLECTION.REPORT_METADATA}${reportID}`,
value: {
isLoadingNewerReportActions: true,
hasLoadingNewerReportActionsError: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can prevent setting hasLoadingNewerReportActionsError to false optimistically on retry of the failed request. Since we are forcefully getting newer actions on retry, we can do something like this here:
...(!force ? {hasLoadingNewerReportActionsError: false} : {}),
And when the API request succeeds, we can set it to false in successData

@@ -1038,6 +1040,7 @@ function getOlderActions(reportID: string, reportActionID: string) {
key: `${ONYXKEYS.COLLECTION.REPORT_METADATA}${reportID}`,
value: {
isLoadingOlderReportActions: true,
hasLoadingOlderReportActionsError: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we may want to do the same for getOlderActions here

@janicduplessis
Copy link
Contributor Author

I think it would work, but I am still hesitant on adding more complexity. For example we do rely on loading state in a few places like here to not trigger additional requests. I am worried that if we have cases of loading data, but not setting this loading state might cause problems and make it more likely to introduce bugs in the future.

My current idea would be to merge it like this, and investigate the canShowHeader issue separately since currently there are just no loading states for new messages so it seems bugged. If canShowHeader is actually working as expected then maybe we can consider another solution.

@mountiny
Copy link
Contributor

I agree we should avoid adding too much complexity. If it comes done to the issue with the button disappearing until new failure comes in, i think its fine for use to merge. Its rare case

@rojiphil can you please continue with the checklist and treat that issues as nab?

Copy link
Contributor

@rojiphil rojiphil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and tests well.

@rojiphil
Copy link
Contributor

My current idea would be to merge it like this, and investigate the canShowHeader issue separately since currently there are just no loading states for new messages so it seems bugged. If canShowHeader is actually working as expected then maybe we can consider another solution.

Would be happy to work on this when we do.

@rojiphil
Copy link
Contributor

can you please continue with the checklist and treat that issues as nab?

@mountiny Checklist is done.

Copy link
Contributor

@mountiny mountiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just couple NABs, as this is useful performance / UX-wise, I am going to merge it as it, but I think we havent asked marketing for copy on the error so I will follow up with that

isTransactionThread: !isEmptyObject(transactionThreadReport),
})}`,
);
const loadOlderChats = useCallback(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these names here are misleading since we are not loading older or newer chats/ reports but only messages. NAB

paddingVertical: 15,
paddingHorizontal: 20,
},
listBoundaryErrorText: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
listBoundaryErrorText: {
listBoundaryErrorText: {

position: 'absolute',
top: 0,
bottom: 0,
left: 0,
right: 0,
height: CONST.CHAT_HEADER_LOADER_HEIGHT,
},
listBoundaryError: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
listBoundaryError: {
listBoundaryError: {

@mountiny mountiny merged commit d796699 into Expensify:main May 13, 2024
18 of 21 checks passed
@OSBotify
Copy link
Contributor

✋ This PR was not deployed to staging yet because QA is ongoing. It will be automatically deployed to staging after the next production release.

@janicduplessis
Copy link
Contributor Author

Thanks @mountiny !

@janicduplessis janicduplessis deleted the @janic/pagination-errors branch May 13, 2024 17:40
@OSBotify
Copy link
Contributor

🚀 Deployed to staging by https://github.com/mountiny in version: 1.4.74-0 🚀

platform result
🤖 android 🤖 success ✅
🖥 desktop 🖥 success ✅
🍎 iOS 🍎 success ✅
🕸 web 🕸 success ✅

@kavimuru
Copy link

@janicduplessis Can you help us with this step?

Test error state by failing pagination requests adding the following code in HttpUtils.ts here. It might also be possible to simulate this by turning off internet after loading a chat.

@kavimuru
Copy link

@rojiphil @mountiny could you help with the QA steps.

Test error state by failing pagination requests adding the following code in HttpUtils.ts here. It might also be possible to simulate this by turning off internet after loading a chat.

@kavimuru
Copy link

@janicduplessis @rojiphil @mountiny Can we run only the following step?

  1. Go in a long chat and scroll to an old message
  2. Right click and "Copy link"
  3. Go in another chat and paste the link
  4. Clear the cache (Account Settings -> Troubleshoot -> Clear cache and restart)
  5. Go back to the chat with the link and click it
  6. You will be in the middle of a chat and can test loading older / newer messages

@rojiphil
Copy link
Contributor

Can we run only the following step?

Well! Running the mentioned steps alone will not help. If we can purposely fail GetOlderActions and GetNewerActions API requests for a specific long chat report in BE then we can test this with the mentioned steps. But I doubt if we can carry out the tests with any invalid Onyx entry in FE.
@mountiny @Gonals Do you see any better/easier option?

@janicduplessis
Copy link
Contributor Author

Yea I think this might be hard to test on staging without being able to change the code. I will see if I can think of something.

@kavimuru
Copy link

kavimuru commented May 16, 2024

@rojiphil @janicduplessis It would be great if you can validate this PR internally then.

@kavimuru
Copy link

@Beamanator Could this PR be validated internally?

@janicduplessis
Copy link
Contributor Author

@rojiphil I figured out the issue that was preventing the loading state from showing, fix here #42332

@Beamanator
Copy link
Contributor

@mountiny @rojiphil @janicduplessis can one of you please test this internally in staging? QA is having trouble 🙏

@janicduplessis janicduplessis mentioned this pull request May 20, 2024
52 tasks
@mountiny
Copy link
Contributor

@janicduplessis is that PR required to be CPed to staging or it can go through the normal deploy process?

I am not sure how this can be tested without changing code. I think we could however mark this off the checklist and we can monitor this change in the #newdot-quality room

@janicduplessis
Copy link
Contributor Author

I think it can just go to normal deploy process. It is hard to test since it requires specific network failures or outage to happen. We can monitor it and make sure we do not have reports of infinite loading after network errors.

@OSBotify
Copy link
Contributor

🚀 Deployed to production by https://github.com/chiragsalian in version: 1.4.74-6 🚀

platform result
🤖 android 🤖 success ✅
🖥 desktop 🖥 success ✅
🍎 iOS 🍎 success ✅
🕸 web 🕸 success ✅

@aldo-expensify
Copy link
Contributor

Getting old report actions gets stuck in a retry infinite loop, could it be related to this PR: https://expensify.slack.com/archives/C05LX9D6E07/p1717711999602319?thread_ts=1717462134.645329&cid=C05LX9D6E07

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants