Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Network] Messages are sent twice #4214

Closed
2 of 5 tasks
isagoico opened this issue Jul 24, 2021 · 49 comments · Fixed by #2703
Closed
2 of 5 tasks

[Network] Messages are sent twice #4214

isagoico opened this issue Jul 24, 2021 · 49 comments · Fixed by #2703
Labels
Daily KSv2 Engineering Internal Requires API changes or must be handled by Expensify staff

Comments

@isagoico
Copy link

isagoico commented Jul 24, 2021

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

  1. Log in to New Expensify with an unstable internet connection
  2. Send some messages to an account you have access too. (Preferably you're logged in on another device with)

Expected Result:

Messages sent with an unstable internet connection should be sent without any issues.

Actual Result:

Messages are sent twice when there's unstable internet connection

Workaround:

Unknown.

Platform:

Where is this issue occurring?

  • Web
  • iOS
  • Android
  • Desktop App
  • Mobile Web

Version Number: 1.0.80-0

Logs: https://stackoverflow.com/c/expensify/questions/4856

Notes/Photos/Videos: Any additional supporting documentation

image

image

Expensify/Expensify Issue URL:

View all open jobs on Upwork


From @JmillsExpensify https://expensify.slack.com/archives/C01GTK53T8Q/p1626890348078000

When chatting with someone on a slow wifi connection, you get two of every message they send.

@MelvinBot
Copy link

Triggered auto assignment to @thienlnam (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

@thienlnam thienlnam added the External Added to denote the issue can be worked on by a contributor label Jul 26, 2021
@MelvinBot
Copy link

Triggered auto assignment to @JmillsExpensify (External), see https://stackoverflow.com/c/expensify/questions/8582 for more details.

@JmillsExpensify
Copy link

Posted to Upwork here! https://www.upwork.com/jobs/~018c22cb61e93a114c. Make sure to share your technical proposal with us here and a colleague will approve your technical proposal. Thank you!

@MelvinBot MelvinBot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Jul 28, 2021
@Santhosh-Sellavel
Copy link
Collaborator

Santhosh-Sellavel commented Jul 28, 2021

Proposal

The issue happens when the network is offline, retryableRequests are set in ONYX, but not cleared from networkRequestQueue(current queue)

App/src/libs/Network.js

Lines 105 to 117 in ad77386

// NetInfo tells us whether the app is offline
if (isOffline) {
if (!networkRequestQueue.length) {
return;
}
// If we have a request then we need to check if it can be persisted in case we close the tab while offline
const retryableRequests = _.filter(networkRequestQueue, request => (
!request.data.doNotRetry && request.data.persist
));
Onyx.set(ONYXKEYS.NETWORK_REQUEST_QUEUE, retryableRequests);
return;
}

After updating in the ONYX, here queue gets duplicated,

App/src/libs/Network.js

Lines 32 to 45 in ad77386

Onyx.connect({
key: ONYXKEYS.NETWORK_REQUEST_QUEUE,
callback: (persistedRequests) => {
if (didLoadPersistedRequests || !persistedRequests) {
return;
}
// Merge the persisted requests with the requests in memory then clear out the queue as we only need to load
// this once when the app initializes
networkRequestQueue = [...networkRequestQueue, ...persistedRequests];
didLoadPersistedRequests = true;
Onyx.set(ONYXKEYS.NETWORK_REQUEST_QUEUE, []);
},
});

networkRequestQueue = [...networkRequestQueue, ...persistedRequests];
We don't need to append networkRequestQueue to itself, if the network is offline, persistedRequests will have all requests we need.

This line should be revisited,

if (didLoadPersistedRequests || !persistedRequests) {

didLoadPersistedRequests if it gets true once, no more requests will get appended.

@parasharrajat
Copy link
Member

We have an existing issue #2451 which solves this problem as well. PR is stalled for that. I will update you here about the PR status.

@thienlnam
Copy link
Contributor

@Santhosh-Sellavel Thank you for the great proposal! The changes look good but it seems like @parasharrajat is fixing this in another PR from fixing another bug so I'm just going to have him assigned to this issue as well.

@Santhosh-Sellavel
Copy link
Collaborator

@parasharrajat you should be rich already 😂.

@quinthar
Copy link
Contributor

Is there anything about logging a comment that is idempotent? This could be another reason to generate random client-side action IDs, such that we can suppress duplicate uploads. /cc @marcaaron @tgolen

@arielgreen arielgreen removed the Help Wanted Apply this label when an issue is open to proposals by contributors label Jul 29, 2021
@JmillsExpensify
Copy link

JmillsExpensify commented Jul 29, 2021

This happened to me again today. Duplicate messages

Screen Shot 2021-07-29 at 1 09 08 PM

@MelvinBot
Copy link

@JmillsExpensify, @parasharrajat, @thienlnam Eep! 4 days overdue now. Issues have feelings too...

@parasharrajat
Copy link
Member

PR is in review.

@MelvinBot MelvinBot removed the Overdue label Aug 3, 2021
@MelvinBot
Copy link

@JmillsExpensify, @parasharrajat, @thienlnam it looks like no one is assigned to work on this job.
Please double the price or add a comment stating why the job isn't being doubled.

@JmillsExpensify
Copy link

I've been OOO. Hired appropriately in Upwork.

@thienlnam thienlnam added the Reviewing Has a PR in review label Aug 9, 2021
@MelvinBot MelvinBot removed the Overdue label Aug 9, 2021
@isagoico
Copy link
Author

I was able to reproduce this just now on Chrome (Don't have solid repro steps tho 😢)
image

@marcaaron
Copy link
Contributor

It seems like making requests idempotent is a comprehensive solution, everything else is just a partial solution.

Just so this issue stays focused... I think this is where we last left things.

@kidroca can we add this context to a new issue (though it already sounds like we have decided it's not a problem)? @iwiznia maybe you want to comment there on the logging stuff.

@MelvinBot MelvinBot removed the Overdue label Aug 31, 2021
@kidroca
Copy link
Contributor

kidroca commented Aug 31, 2021

@isagoico
To reproduce consistently, go offline / airplane mode for a minute or two
Send a few messages in 2 or 3 different chats, (the more messages/different chats the better chance of happening)
Resume connectivity
You can also notice that sometimes the order of the messages would not be correct

@marcaaron
I can open a separate ticket depending on the Slack conversation as it seems, it's not something to be considered a problem ATM

@marcaaron
Copy link
Contributor

I can open a separate ticket depending on the Slack conversation as it seems, it's not something to be considered a problem ATM

Sounds good, thanks for this investigation btw - I had a feeling we were missing something right under our noses.

Let's see what @quinthar thinks? I'm a bit unsure how to prioritize based on the previous comments.

I think at a minimum we should treat these as separate issues so there is not too much overlap in the conversation here. I've labeled this Internal so we could focus on the BE changes to make comments idempotent in this issue.

@iwiznia
Copy link
Contributor

iwiznia commented Sep 13, 2021

A call to processNetworkQueue happens from inside processNetworkQueue which re-triggers all the requests for the "parent" pipeline

Oh wow, yes. I broke hits inadvertently when adding the logging and the fact that logs are not always sent immediately made the issue harder to pinpoint. Great investigation!

It seems like making requests idempotent is a comprehensive solution, everything else is just a partial solution.

I agree with this, otherwise any other code in the future that does what the logging did will again break things.

@marcaaron
Copy link
Contributor

any other code in the future that does what the logging did will again break things

Looking at this more though @kidroca has shown we are double processing the networking queue whenever we send a log packet.

For this to happen again we would need to add code that does a Network.post() from inside processNetworkRequestQueue(). I'm not sure why we'd expect this to happen again. This feels like a cut and dried 🐛 to me.

Anyway, there's an easy solution to the current problem - just send all logs on the "next tick" of the request queue.

App/src/libs/Network.js

Lines 256 to 257 in 97f8641

// Try to fire off the request as soon as it's queued so we don't add a delay to every queued command
processNetworkRequestQueue();

We can just add a param to the request that prevents this call. The log request will be sent on the next call instead.

Something like

if (!processImmediately) {
    return;
}

processNetworkRequestQueue();

@kidroca
Copy link
Contributor

kidroca commented Oct 25, 2021

That's one way to do it
I think it might be simpler to move it to the next thick like so

function Log(parameters) {
  const commandName = 'Log';
  requireParameters(['message', 'parameters', 'expensifyCashAppVersion'],
        parameters, commandName);

  // Note: We are forcing Log to run since it requires no authToken and should only be queued when we are offline.
  // We're scheduling the call for the next tick so that we don't re-trigger the network queue 
  return Promise.resolve()
    .then(() => Network.post(commandName, {...parameters, forceNetworkRequest: true}));
}
  • this would add the request to the next queue

For this to happen again we would need to add code that does a Network.post() from inside processNetworkRequestQueue(). I'm not sure why we'd expect this to happen again. This feels like a cut and dried 🐛 to me.

I agree it's a rare thing, but it happened for Logging and can happen again, people are unaware that their change can produce such effects

  • catching a change like this in a review is not very easy - we already failed once
  • while the bug is not produced immediately but once enough logs are accumulated which again makes it hard to discover

@marcaaron
Copy link
Contributor

Yeah that could work, also feels like a setTimeout() workaround. We need a note to explain why we are pushing things off to the next tick so maybe we can find a better way.

Granted, there might never be another case where we want to avoid triggering a network request immediately with a shouldSendImmediately: false param, but feels like an approachable concept.

  • catching a change like this in a review is not very easy - we already failed once
  • while the bug is not produced immediately but once enough logs are accumulated which again makes it hard to discover

For what it's worth, I managed to add a failing test for this so we will have some protection if someone decides it would be fun to break the behavior again.

There's also another measure we could take here which is to just completely disallow processsNetworkRequest() to be called if it is running at all (I doubt we'd ever think about this problem again if we do this - famous last words 😅).

let isProcessing;
function processNetworkQueue() {
    if (isProcessing) {
        return;
    }
    // ...process
    isProcessing = false;
}

as long as the queue does not fail to empty in the "...process" part of the function there is no way for duplicate processing to happen and whatever is trying to happen will run on the next tick.

@kidroca
Copy link
Contributor

kidroca commented Oct 25, 2021

Good stuff,
Now that I took another look I realized this is in the post method and it's simpler and clearer 👍

if (!processImmediately) {
    return;
}

The isProcessing idea looks good too...

@botify botify closed this as completed Oct 26, 2021
@mallenexpensify
Copy link
Contributor

@marcaaron , tbc.... once this PR is deployed to production, there shouldn't be double messages any more? If so.. 🎉 🎉 🎉 🎉 🎉

@marcaaron
Copy link
Contributor

That is the hope! We will have to wait and see.

@kavimuru kavimuru reopened this Nov 16, 2021
@kavimuru
Copy link

@marcaaron @isagoico
Issue was able to reproduce on the regression round for 1.1.15-1. So reopening the issue.

Bug5330310_Sent_message.mp4

.

@MelvinBot
Copy link

@marcaaron Whoops! This issue is 2 days overdue. Let's get this updated quick!

@marcaaron
Copy link
Contributor

Hmm there may be other causes such as messing with the cell data in this case. I think perhaps we should create a new issue with the exact reproduction steps for that. The issue with Log should be sorted. @kavimuru please open a new issue if you can and document exactly how to reproduce thanks!

@MelvinBot MelvinBot removed the Overdue label Nov 22, 2021
@marcaaron marcaaron removed their assignment Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Daily KSv2 Engineering Internal Requires API changes or must be handled by Expensify staff
Projects
None yet