Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network: Retry requests that failed in flight #6567

Merged
merged 21 commits into from
Jan 12, 2022

Conversation

kidroca
Copy link
Contributor

@kidroca kidroca commented Dec 2, 2021

Details

Add updates to:

  • Retry requests when something unexpected caused an error in flight
  • Prevent multiple authentication calls running at the same time

Fixed Issues

$ #5952

Tests

We're trying to make a request fail in the middle, like connection loss
The easiest way so far has been using Chrome to "slow down the connection"
On staging it's even enough to cause a timeout (but on localhost I think there are different rules and it doesn't happen)
Check out the web before after video for a thorough demonstration

Web (Chrome) & Desktop

  1. Open dev tools and add a very slow connection profile
  2. Enable the throttled connection you just made
  3. Write a message or two
  • on staging the request should fail in about 15 sec. just wait
  • on localhost you can kill your router connection now to cause the request to fail
    • you'll probably need to also kill the proxy server and restart it
    • also you might need to go to offline and online (in Chrome) to let it know it really don't have internet..
  1. Verify the messages failed (see Report_AddComment) in the network tab
  2. Set the Network speed to "offline" - now we're acting like we're fully offline
  3. Send a few more messages, but do take a note where the message from step 3) are
  4. Now set Network speed to "full" and observe the requests getting retried
  • if you tried this on localhost resume you're real internet connection before setting network speed to full in Chrome
  1. There should be no pending messages stuck on the bottom

Others

We're trying to do the same as above, but without dev helpers
It sometimes possible to trick the device you're still online, like if you unplug your network cable or use a tethered connection and disable "data" on the tethered connection, then the requests should timeout and fail
When you resume connectivity any pending request should be sent

QA Steps

See if you can do the "Tests" above, it should be easier on staging and Chrome, just letting the connections timeout using dev tools and throttle
Otherwise a real world case it to use a device in a spotty connection and notice that you're getting offline or messages are sent too slow
Check for regressions around network related stuff like uploading an attachment
Spending some time in airplane mode then resuming
Use the app for a while and monitor for any indication of a regression that might be caused here

Tested On

  • Web
  • Mobile Web
  • Desktop
  • iOS
  • Android

Screenshots

Web

Before and After (after starts around the 20th second mark)
Retrying.failed.requests.mp4

Mobile Web

image

Desktop

image

iOS

image

Android

Retry.failed.requests.Android.mp4

Retry requests when something unexpected caused an error in flight
handleExpiredAuthToken does not need a then and a catch block
that do the same thing
We don't need to call `addDefaultValuesToParameters` - `Network.post` already does that
…cation calls

Since we're already managing a promise based logic we can just use a "finally" block
It's not needed to merge the data to Onyx if it only serves to update a local variable value
Copy link
Contributor Author

@kidroca kidroca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some inline notes on the changes

src/libs/API.js Outdated Show resolved Hide resolved
src/libs/actions/Session/index.js Outdated Show resolved Hide resolved
tests/unit/NetworkTest.js Outdated Show resolved Hide resolved
tests/unit/NetworkTest.js Outdated Show resolved Hide resolved
@kidroca kidroca marked this pull request as ready for review December 2, 2021 23:57
@kidroca kidroca requested a review from a team as a code owner December 2, 2021 23:57
@botify botify requested a review from johnmlee101 December 2, 2021 23:57
@MelvinBot MelvinBot removed the request for review from a team December 2, 2021 23:57
@kidroca
Copy link
Contributor Author

kidroca commented Dec 9, 2021

Bump @johnmlee101
This one have been ready for a while as well

Copy link
Contributor

@johnmlee101 johnmlee101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, rest seems good and I'll get another reviewer to double check

src/libs/API.js Outdated Show resolved Hide resolved
@johnmlee101 johnmlee101 requested a review from a team December 9, 2021 19:54
@MelvinBot MelvinBot requested review from johnmlee101 and removed request for a team December 9, 2021 19:54
@johnmlee101
Copy link
Contributor

cc @marcaaron @roryabraham if you want to take a crack at this as well and review in addition to me

@marcaaron
Copy link
Contributor

Ah sorry I don't have time to deep dive into the networking code atm. I mentioned this somewhere else, but I think it would be valuable to improve the existing networking logic with a refactor of some kind before complicating it further with whatever changes we are proposing (it will take me some time to just catch up on what it is we're doing and why).

@kidroca
Copy link
Contributor Author

kidroca commented Dec 9, 2021

This PR untangles some stuff and actually simplifies the netwotking code

Half of the changes are from the added unit tests

A hold on the network issues, would block me on the on persisted file handling as well

src/libs/API.js Outdated
handleExpiredAuthToken(queuedRequest.command, queuedRequest.data, queuedRequest.type)
.then(queuedRequest.resolve)
.catch(queuedRequest.reject);
handleExpiredAuthToken(queuedRequest);
Copy link
Contributor

@roryabraham roryabraham Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pause the request queue right here rather than in Authenticate? Because we know on line 155 if we're going to try to reauthenticate, so maybe we should immediate pause the request queue so that another request doesn't begin processing before we attempt to reauthenticate?

Copy link
Contributor Author

@kidroca kidroca Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to - it's impossible for another requests to start

Whether the call is now or "a few lines later" inside Authenticate it's still executed as sync code, which makes it impossible for other requests to start

The unit tests are covering this: "Multiple Authenticate calls should be resolved with the same value"
we start 5 requests at the same time - if pausing didn't work as expected it would have let another request to start

Putting the "pause" anywhere else but in Authenticate, only hides details that might be important and scatters the logic around

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the "pause" anywhere else but in Authenticate, only hides details that might be important and scatters the logic around

more opinion than fact (in my opinion 😄)

Copy link
Contributor Author

@kidroca kidroca Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more opinion than fact (in my opinion 😄)

So you both think we should pause while authenticating, call the pause outside of Authenticate but call the unpause from inside Authenticate?

The original code was pausing the Network queue in handleExpiredAuthToken clearly the intent was to prevent any (other) requests while authentication runs.
Besides running Authenticate would change the token so it's best to pause any requests, otherwise they will happen in the middle of authentication and get rejected, causing a new authenticate and then retry the actual call

Since authenticate and reauthenticate are exported, they can and are triggered from outside (pusher)

src/libs/API.js Outdated Show resolved Hide resolved
Copy link
Contributor Author

@kidroca kidroca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing open threads

src/libs/API.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated
handleExpiredAuthToken(queuedRequest.command, queuedRequest.data, queuedRequest.type)
.then(queuedRequest.resolve)
.catch(queuedRequest.reject);
handleExpiredAuthToken(queuedRequest);
Copy link
Contributor Author

@kidroca kidroca Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to - it's impossible for another requests to start

Whether the call is now or "a few lines later" inside Authenticate it's still executed as sync code, which makes it impossible for other requests to start

The unit tests are covering this: "Multiple Authenticate calls should be resolved with the same value"
we start 5 requests at the same time - if pausing didn't work as expected it would have let another request to start

Putting the "pause" anywhere else but in Authenticate, only hides details that might be important and scatters the logic around

src/libs/actions/Session/index.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated Show resolved Hide resolved
Copy link
Contributor

@marcaaron marcaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the tests yet but did a deep dive on the changes. Overall, I'm hoping we can maybe scale things back a bit so we are not mixing together refactoring and addressing the main issue (which I think was to retry requests that failed because of some network condition).

src/libs/API.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated
// original request.
Network.post(originalCommand, originalParameters, originalType)
));
reauthenticate(originalRequest.command)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAB, just rubber ducking... so we don't need to return this promise anymore because we are handling the promise in retryRequest(). Makes sense and looks clean.

However, when we previously successfully called reauthenticate() we were calling addDefaultValuesToParameter() and passing in the original parameters (no authToken present there) and the comment seems to suggest that we are getting a new authToken to use (the one set locally inside reauthenticate())

authToken = response.authToken;

It does look like we will add those parameters here when making the request

? enhanceParameters(queuedRequest.command, requestData)

I'm guessing that this is fine because the request we are passing to this callback here

App/src/libs/Network.js

Lines 221 to 229 in 021ad5d

const finalParameters = _.isFunction(enhanceParameters)
? enhanceParameters(queuedRequest.command, requestData)
: requestData;
onRequest(queuedRequest, finalParameters);
HttpUtils.xhr(queuedRequest.command, finalParameters, queuedRequest.type, queuedRequest.shouldUseSecure)
.then(response => onResponse(queuedRequest, response))
.catch(error => onError(queuedRequest, error));
});

Should never have an authToken in it's requestData...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR is the existing code here just pointless?

App/src/libs/API.js

Lines 112 to 117 in 021ad5d

return reauthenticate(originalCommand)
.then(() => {
// Now that the API is authenticated, make the original request again with the new authToken
const params = addDefaultValuesToParameters(originalCommand, originalParameters);
return Network.post(originalCommand, params, originalType);
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should never have an authToken in it's requestData...?

That makes sense - only add (the latest) token for authenticated requests and don't expect it to be part of requestData

TL;DR is the existing code here just pointless?

Yeah, it unneeded but it also might be a subtle bug calling addDefaultValuesToParameters here. It would add authToken to original params which might get persisted, and might expire, but won't be possible to update:

App/src/libs/API.js

Lines 75 to 77 in 7af06bc

if (isAuthTokenRequired(command) && !parameters.authToken) {
finalParameters.authToken = authToken;
}

It might be best to update the enhancer to always add the latest token

    if (isAuthTokenRequired(command)) {
        finalParameters.authToken = authToken;
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Maybe we can address this in a follow up issue if it becomes a problem. Not sure I am seeing the subtle bug yet but does sound like maybe persisted requests should not be saved with the authToken and we should always prefer the most recent one we have.

src/libs/API.js Outdated
@@ -217,7 +216,15 @@ function Authenticate(parameters) {
'partnerUserSecret',
], parameters, commandName);

return Network.post(commandName, {
// Don't run more than one Authentication requests at the same time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind explaining (briefly) the motivation behind moving this logic into Authenticate() instead of handleExpiredAuthToken()? Didn't quite figure it out from looking at the PR.

Could we theoretically leave the pendingAuthTask idea and put this logic back into handleExpiredAuthToken() where it was and achieve the "retry requests that fail in flight" objective?

Feels like isAuthenticating was swapped for pendingAuthTask. Is there a clear reason to prefer one over the other?

Copy link
Contributor Author

@kidroca kidroca Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind explaining (briefly) the motivation behind moving this logic into Authenticate() instead of handleExpiredAuthToken()? Didn't quite figure it out from looking at the PR.

I've actually made a comment about it here:

The original code was pausing the Network queue in handleExpiredAuthToken clearly the intent was to pause (other) requests while authentication runs.
Besides running Authenticate would change the token so it's best to pause any requests, otherwise they will happen in the middle of authentication and get rejected, causing a new authenticate and then retry the actual call


Could we theoretically leave the pendingAuthTask idea and put this logic back into handleExpiredAuthToken() where it was and achieve the "retry requests that fail in flight" objective?

Sure, would you like me to open a separate PR about it?

Feels like isAuthenticating was swapped for pendingAuthTask. Is there a clear reason to prefer one over the other?

Capturing a Promise inside pendingAuthTask allow us to hook other callers to the same Authentication call rather than starting a new call.
If Authentication is already in progress when you call Authenticate you should hook to it, rather than starting a new call and potentially invalidating the result of the one that's already running

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe we can create a new issue for this as well? I'm not quite seeing the path for how this can happen or why we should be concerned about it - but guessing you would be able to document it and we can address it separately if it's concerning?

src/libs/API.js Outdated
return pendingAuthTask;
}

// Make any other requests wait while authentication happens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? If we are "pausing" the request queue what requests are being made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? If we are "pausing" the request queue what requests are being made?

Pausing the queue doesn't prevent you from making actions that trigger a Network.post
I'm saying anything pending or added to the network queue has to wait until we're authenticated.

src/libs/API.js Outdated
})
.finally(() => {
// The authentication process is finished so the network can be unpaused to continue
// processing requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is copied from elsewhere but think it would be good to take the opportunity and explain

  • why we are resetting the pendingAuthTask
  • why we unpause the queue here
  • why we are doing this regardless of success or failure of authentication

* @returns {Promise}
* @param {String} [type='post']
* @param {Boolean} [shouldUseSecure=false] - Whether we should use the secure API
* @returns {Promise<{ jsonCode: Number }>}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that ppl would be open to hearing your thoughts on why these extra docs are useful or we should do it. But most people are not adding them consistently so I think they should be removed.

I'd recommend:

  1. Bringing it up in the Slack channel
  2. Proposing a change to the style guide
  3. Fixing all the JSDocs so they are consistent

src/libs/API.js Outdated
handleExpiredAuthToken(queuedRequest.command, queuedRequest.data, queuedRequest.type)
.then(queuedRequest.resolve)
.catch(queuedRequest.reject);
handleExpiredAuthToken(queuedRequest);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the "pause" anywhere else but in Authenticate, only hides details that might be important and scatters the logic around

more opinion than fact (in my opinion 😄)

src/libs/API.js Outdated
if (isRequestRetriable(queuedRequest)) {
retryRequest(queuedRequest);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if wrong, but this here seems to be the main material change?

If we get a network error (like a fetch error) now we just toss the request into the ether instead of retrying it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's correct. I'm sort of confused why we are doing this here instead of when we call onError() here

App/src/libs/Network.js

Lines 226 to 228 in 021ad5d

HttpUtils.xhr(queuedRequest.command, finalParameters, queuedRequest.type, queuedRequest.shouldUseSecure)
.then(response => onResponse(queuedRequest, response))
.catch(error => onError(queuedRequest, error));

Could we just add the request to the array of requests to retry instead?

App/src/libs/Network.js

Lines 243 to 245 in 021ad5d

// We clear the request queue at the end by setting the queue to retryableRequests which will either have some
// requests we want to retry or an empty array
networkRequestQueue = requestsToProcessOnNextRun;

Seems easier to do that and then we won't need onError() to call retryRequest() which calls Network.post() etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get a network error (like a fetch error) now we just toss the request into the ether instead of retrying it?

The opposite actually. When we get a network/fetch error we'll get right here where we retry if we're supposed to.

If that's correct. I'm sort of confused why we are doing this here instead of when we call onError() here

App/src/libs/Network.js

Lines 226 to 228 in 021ad5d

HttpUtils.xhr(queuedRequest.command, finalParameters, queuedRequest.type, queuedRequest.shouldUseSecure)
.then(response => onResponse(queuedRequest, response))
.catch(error => onError(queuedRequest, error));

It happens inside onError aka the ErrorHanlder because we

  • log about it
  • do "something" with the session
  • check and retry the request if we should
  • or reject to the original caller otherwise

Are you suggesting something like

HttpUtils.xhr(queuedRequest.command, finalParameters, queuedRequest.type, queuedRequest.shouldUseSecure) 
	     .then(response => onResponse(queuedRequest, response)) 
	     .catch(error => {
	        if (canRetryRequest(queuedRequest)) {
	           requestsToProcessOnNextRun.push(queuedRequest);
	        } else {
	          onError(queuedRequest, error);
	        }
	     }); 

If we shouldn't log or call setSessionLoadingAndError we could move the retry to Network. The only problem would be that we have a onError handler that's not handling all the errors

Besides the SuccessHandler also retries the request through Network.post(), but if it were to throw on expired token it would trigger the catch block and reschedule the request through the same logic


One thing to note in advance - the following is not possible:

	     .catch(error => {
                onError(queuedRequest, error);

	        if (canRetryRequest(queuedRequest)) {
	           requestsToProcessOnNextRun.push(queuedRequest);
	        }
	     }); 

onError would reject the underlying promise and it won't be possible to resolve or reject it anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opposite actually. When we get a network/fetch error we'll get right here where we retry if we're supposed to.

Yes, sorry I am describing the current behavior (without your changes)

I see your point about losing the logs. But we could also:

  • Log that we are retrying because of a network error from within Network
  • Add the request back into the queue right away
  • Not call the onError() because we don't need to reject the promise or call setLoadingAndError()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a plan

Copy link
Contributor Author

@kidroca kidroca Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're already retrying requests on a few places inside API, wouldn't it be best to extract logic that can cover those as well (handleExpiredAuthToken) (And all of those are hooked back to the original requests' promise)

Didn't really get this, but maybe do not understand. We are not losing the original promise context if we put it back into the request queue with the resolve and reject attached. If the request gets a response it will be handled.

I'm saying that we already have a retry logic inside API (calling Network.post)
And we're discussing whether we should add another retry logic in Network (to spare us from calling Network.post, and to reuse canRetry)
Let's keep using Network.post to retry the request like we do in the Response Handler (expired token)
It would still aid refactoring, when we identify exactly how it should happen


On one side (Network) we

  • 🟢 can reuse canRetry
  • ❔we spare putting the request through post again (and chaining to the original promise)
  • 🔴 but we lose the ability to log without without introducing circular dep

On the other

  • 🟢 we can log
  • 🟡 we duplicate a few one liners (lodash.get data.shouldRetry)
  • ❔ we need to call Network.post and chain to the original promise

❔ - is it really good or bad?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a cors error can happen, or an error resulting from calling an url that no longer exists
these are developer errors and would result in a deploy blocker anyway

Are these the only possibilities we should expect to see? And if so and these should be deploy blocking events why would the proposed solution be to retry them? Should we maybe at least queue a log to alert in such cases in addition to retrying over and over?

The requests we retry here would eventually succeed

Sorry, I'm not quite seeing where the eventual success happens.

Let's keep using Network.post to retry the request like we do in the Response Handler (expired token)

As I mentioned already, I don't see a compelling reason to do it this way so it's not worth having this many back and forth exchanges to resolve.

Regardless, it seems more important to figure out how to handle the infinite retries concern.

I'm not sure if there's a way to wrap that concern up quickly. But without a proposal to do so I'm not sure it's a great idea to retry these requests.

Copy link
Contributor Author

@kidroca kidroca Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the only possibilities we should expect to see? And if so and these should be deploy blocking events why would the proposed solution be to retry them? Should we maybe at least queue a log to alert in such cases in addition to retrying over and over?

I'm not suggesting to retry those, but the error itself doesn't carry information to distinct them - fetch would just call catch with an unspecific Failed to fetch error. Supposedly this is by design to prevent exploits

  • edit: In the end I kinda am in favor of retrying everything

We can't distinct what caused an error. A network error can be caused by:

  • CORS
  • DNS errors
  • TLS negotiation failure
  • others

The most likely reason to be in the catch handler is that we're either offline or we've experienced a fluke and can recover by retrying the request

To your point for prevent infinite loop of requests we could only retry the request if

  1. it failed
  2. we check our current offline state in the error handler. If we're offline that's probably the reason the request failed - we can execute logic to retry it

Let's keep using Network.post to retry the request like we do in the Response Handler (expired token)

As I mentioned already, I don't see a compelling reason to do it this way so it's not worth having this many back and forth exchanges to resolve.

I've edited my last reply with pros / cons, but if that's not making sense just tell me what you need done:

  • have the retry happen in Network.js
  • don't log anything or log to console.debug
  • push the request directly to the queue

And if so and these should be deploy blocking events why would the proposed solution be to retry them?

If the reason for a request to fail is purely network related and not the actual payload we should retry it - same reason we retry it after re-authentication or why we post it after it has been persisted in storage be it a minute or a year
Because we "promise eventual delivery"

Let's say there's a CORS error and posting chat messages stops working - there's nothing you can do on the front end to fix the issue. Instead of commiting to data loss and discarding the requests we can

  1. inform the users that their actions aren't reaching target (but will be sent as soon as the problem is resolved)
  2. keep retrying the requests
  3. if the user decides to quit - any "important" requests were persisted and will get retried

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to retry the request, besides everything else said
When we fail to reach the API for any reason - retry the request (if it's retriable)

Keep the logic simple we can extend it if it becomes a problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tell me what you need done

Sure, sorry for being indirect - that new plan sounds good for now

  • have the retry happen in Network.js
  • don't log anything
  • push the request directly to the queue

It would be good to log when the request happens so maybe next we can solve the circular dependency issues in a new PR.

@marcaaron
Copy link
Contributor

This PR untangles some stuff and actually simplifies the netwotking code

Just want to add some thoughts here that there is a time to simplify and a time to get stuff done. When we choose to refactor complex code that our team has limited knowledge of it draws a significant amount of time and energy from those who do have that knowledge to review carefully.

@kidroca here's some feedback and some things that might be helpful in the future...

  • try to follow our conventions to the best of your ability
  • aim to do a minimum number of changes in PRs where the area of the code is particularly complex or not widely understood
  • avoid fixing multiple issues simultaneously when working in a complex area

Copy link
Contributor Author

@kidroca kidroca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR untangles some stuff and actually simplifies the netwotking code

Just want to add some thoughts here that there is a time to simplify and a time to get stuff done. When we choose to refactor complex code that our team has limited knowledge of it draws a significant amount of time and energy from those who do have that knowledge to review carefully.

@kidroca here's some feedback and some things that might be helpful in the future...

@marcaaron
You making this sound like a bigger problem that it actually is
Why didn't you say so on your initial review?

It's a normal sized PR, considering half of it is automated tests it's actually small
Excuse me if it took you a significant amount of time review, I didn't think that it will!

I'll remove my refactors from this PR, but I wish you or anyone else had brought that up on one of the previous reviews

src/libs/API.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated
// original request.
Network.post(originalCommand, originalParameters, originalType)
));
reauthenticate(originalRequest.command)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should never have an authToken in it's requestData...?

That makes sense - only add (the latest) token for authenticated requests and don't expect it to be part of requestData

TL;DR is the existing code here just pointless?

Yeah, it unneeded but it also might be a subtle bug calling addDefaultValuesToParameters here. It would add authToken to original params which might get persisted, and might expire, but won't be possible to update:

App/src/libs/API.js

Lines 75 to 77 in 7af06bc

if (isAuthTokenRequired(command) && !parameters.authToken) {
finalParameters.authToken = authToken;
}

It might be best to update the enhancer to always add the latest token

    if (isAuthTokenRequired(command)) {
        finalParameters.authToken = authToken;
    }

src/libs/API.js Outdated
handleExpiredAuthToken(queuedRequest.command, queuedRequest.data, queuedRequest.type)
.then(queuedRequest.resolve)
.catch(queuedRequest.reject);
handleExpiredAuthToken(queuedRequest);
Copy link
Contributor Author

@kidroca kidroca Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more opinion than fact (in my opinion 😄)

So you both think we should pause while authenticating, call the pause outside of Authenticate but call the unpause from inside Authenticate?

The original code was pausing the Network queue in handleExpiredAuthToken clearly the intent was to prevent any (other) requests while authentication runs.
Besides running Authenticate would change the token so it's best to pause any requests, otherwise they will happen in the middle of authentication and get rejected, causing a new authenticate and then retry the actual call

Since authenticate and reauthenticate are exported, they can and are triggered from outside (pusher)

src/libs/API.js Outdated
if (isRequestRetriable(queuedRequest)) {
retryRequest(queuedRequest);
return;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get a network error (like a fetch error) now we just toss the request into the ether instead of retrying it?

The opposite actually. When we get a network/fetch error we'll get right here where we retry if we're supposed to.

If that's correct. I'm sort of confused why we are doing this here instead of when we call onError() here

App/src/libs/Network.js

Lines 226 to 228 in 021ad5d

HttpUtils.xhr(queuedRequest.command, finalParameters, queuedRequest.type, queuedRequest.shouldUseSecure)
.then(response => onResponse(queuedRequest, response))
.catch(error => onError(queuedRequest, error));

It happens inside onError aka the ErrorHanlder because we

  • log about it
  • do "something" with the session
  • check and retry the request if we should
  • or reject to the original caller otherwise

Are you suggesting something like

HttpUtils.xhr(queuedRequest.command, finalParameters, queuedRequest.type, queuedRequest.shouldUseSecure) 
	     .then(response => onResponse(queuedRequest, response)) 
	     .catch(error => {
	        if (canRetryRequest(queuedRequest)) {
	           requestsToProcessOnNextRun.push(queuedRequest);
	        } else {
	          onError(queuedRequest, error);
	        }
	     }); 

If we shouldn't log or call setSessionLoadingAndError we could move the retry to Network. The only problem would be that we have a onError handler that's not handling all the errors

Besides the SuccessHandler also retries the request through Network.post(), but if it were to throw on expired token it would trigger the catch block and reschedule the request through the same logic


One thing to note in advance - the following is not possible:

	     .catch(error => {
                onError(queuedRequest, error);

	        if (canRetryRequest(queuedRequest)) {
	           requestsToProcessOnNextRun.push(queuedRequest);
	        }
	     }); 

onError would reject the underlying promise and it won't be possible to resolve or reject it anymore

src/libs/API.js Outdated Show resolved Hide resolved
src/libs/API.js Outdated
@@ -217,7 +216,15 @@ function Authenticate(parameters) {
'partnerUserSecret',
], parameters, commandName);

return Network.post(commandName, {
// Don't run more than one Authentication requests at the same time
Copy link
Contributor Author

@kidroca kidroca Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind explaining (briefly) the motivation behind moving this logic into Authenticate() instead of handleExpiredAuthToken()? Didn't quite figure it out from looking at the PR.

I've actually made a comment about it here:

The original code was pausing the Network queue in handleExpiredAuthToken clearly the intent was to pause (other) requests while authentication runs.
Besides running Authenticate would change the token so it's best to pause any requests, otherwise they will happen in the middle of authentication and get rejected, causing a new authenticate and then retry the actual call


Could we theoretically leave the pendingAuthTask idea and put this logic back into handleExpiredAuthToken() where it was and achieve the "retry requests that fail in flight" objective?

Sure, would you like me to open a separate PR about it?

Feels like isAuthenticating was swapped for pendingAuthTask. Is there a clear reason to prefer one over the other?

Capturing a Promise inside pendingAuthTask allow us to hook other callers to the same Authentication call rather than starting a new call.
If Authentication is already in progress when you call Authenticate you should hook to it, rather than starting a new call and potentially invalidating the result of the one that's already running

src/libs/API.js Outdated
return pendingAuthTask;
}

// Make any other requests wait while authentication happens
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? If we are "pausing" the request queue what requests are being made?

Pausing the queue doesn't prevent you from making actions that trigger a Network.post
I'm saying anything pending or added to the network queue has to wait until we're authenticated.

@kidroca
Copy link
Contributor Author

kidroca commented Jan 7, 2022

Removed all the changes but the main thing, if you still want me to move the retry to Network and split error handling there leave a comment

@marcaaron
Copy link
Contributor

You making this sound like a bigger problem that it actually is

To be clear, my intention is only to give you some advice on how to grow and work better with us not to make a big deal or criticize. It's just feedback to keep in mind for the future.

Why didn't you say so on your initial review?

This is my first review. I mentioned that it would take me some time to catch up and that I'd be more comfortable with an initial refactor. I can see how that looks like a missed opportunity to be more direct initially, apologies for that, and thanks for the feedback!

Still, not sure it's productive to assign blame and not sure it would change the feedback that I'm giving. This is a PR. Author proposes changes. Reviewers exchange feedback on how to improve next time. Thanks for hearing me out!

@marcaaron
Copy link
Contributor

if you still want me to move the retry to Network and split error handling there leave a comment

I do think we should try this, but could be convinced otherwise. It just seems it's easier to connect the dots from HttpUtils.xhr() back to the queue in the same Network lib vs. doing onError() -> Network.registerErrorHandler() callback (API.js) -> Network.post().

@kidroca
Copy link
Contributor Author

kidroca commented Jan 12, 2022

Moved logic to Network
Ready for review

@marcaaron marcaaron merged commit 8502cd1 into Expensify:main Jan 12, 2022
@OSBotify
Copy link
Contributor

🚀 Deployed to staging by @marcaaron in version: 1.1.27-5 🚀

platform result
🤖 android 🤖 failure ❌
🖥 desktop 🖥 success ✅
🍎 iOS 🍎 failure ❌
🕸 web 🕸 success ✅

@OSBotify
Copy link
Contributor

🚀 Deployed to staging by @marcaaron in version: 1.1.27-6 🚀

platform result
🤖 android 🤖 failure ❌
🖥 desktop 🖥 success ✅
🍎 iOS 🍎 failure ❌
🕸 web 🕸 success ✅

@mvtglobally
Copy link

mvtglobally commented Jan 13, 2022

@marcaaron @roryabraham @johnmlee101 We are struggling a bit to get very slow Network where QA resources are. 1 tester was able to get low 4G from the basement ;), so it seem to be pass, but we are not able to run all environments properly.

@parasharrajat
Copy link
Member

For controlling the speed on Desktop and Web. You can use https://nshipster.com/network-link-conditioner/ for macOS. @mvtglobally

@Jag96
Copy link
Contributor

Jag96 commented Jan 14, 2022

Tested on Desktop and looks good!

@OSBotify
Copy link
Contributor

🚀 Deployed to production by @Jag96 in version: 1.1.29-5 🚀

platform result
🤖 android 🤖 success ✅
🖥 desktop 🖥 success ✅
🍎 iOS 🍎 success ✅
🕸 web 🕸 success ✅

@kidroca kidroca changed the title Network: Retry requests that failed in flight & prevent multiple auth calls Network: Retry requests that failed in flight Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants