Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove SleepTimer as it is error prone and rely on Pusher instead #3892

Closed
marcaaron opened this issue Jul 7, 2021 · 38 comments
Closed

Remove SleepTimer as it is error prone and rely on Pusher instead #3892

marcaaron opened this issue Jul 7, 2021 · 38 comments
Assignees
Labels
Daily KSv2 Engineering Reviewing Has a PR in review

Comments

@marcaaron
Copy link
Contributor

marcaaron commented Jul 7, 2021

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

Normal app usage

I've been online without any changes in network status, and noticed E.cash on desktop web had the "syncing" icon. I looked and saw countless pages of this -- what is it doing?

Expected Result:

The app should only re-fetch data when necessary

Actual Result:

@quinthar noticed that the desktop app was "syncing" for no reason and saw this in the logs:

image (7)

  • These logs mean that the desktop/web app is repeatedly "polling for data" for some reason.

Workaround:

Yes, but this could create a particularly bad situation if this were to happen on mobile for a user with a large number of reports as we'd be firing off many unnecessary API requests.

Platform:

Web
Desktop App

Version Number:
Logs: https://stackoverflow.com/c/expensify/questions/4856
Notes/Photos/Videos: Any additional supporting documentation
Expensify/Expensify Issue URL:

View all open jobs on Upwork

@marcaaron marcaaron added the AutoAssignerTriage Auto assign issues for triage to an available triage team member label Jul 7, 2021
@MelvinBot
Copy link

Triggered auto assignment to @stephanieelliott (AutoAssignerTriage), see https://stackoverflow.com/c/expensify/questions/4749 for more details.

@MelvinBot MelvinBot removed the AutoAssignerTriage Auto assign issues for triage to an available triage team member label Jul 7, 2021
@marcaaron
Copy link
Contributor Author

I think we're still looking for a reproduction on this one.

@marcaaron
Copy link
Contributor Author

I noticed this on dev and turned verbose logging on. Pretty sure there is something wrong with the clock timer logic we have based on this...

Screen Shot 2021-07-07 at 10 50 05 AM

It exists to make sure we refetch reports when re-opening a laptop, but maybe there's a better way to go about it (or something has recently changed about that code - since I don't recall having this problem until now).

@stephanieelliott
Copy link
Contributor

@marcaaron are we able to repro this yet? I've tried doing it myself (unsuccessful) but not too sure what steps to follow.

@marcaaron
Copy link
Contributor Author

Nope, I've seen it pop up while working in E.cash locally - but still unsure what causes it to get into that state.

@stephanieelliott
Copy link
Contributor

OK, I'm gonna hold off on passing this through to Engineering until we can repro this.

Or rather than trying to repro ourselves, another idea could be to post reproducing the problem as a job on Upwork?

@marcaaron
Copy link
Contributor Author

We still don't really have a precedent for that as far as I know, but it's something @mallenexpensify has floated before. It would be nice to get someone working on this.

I did manage to find this Slack thread.

And this older issue that we closed that seems like the same issue.

@mallenexpensify
Copy link
Contributor

I wonder, if for issues like this, where there is an issue reproducing, if we can state at the top of the post need to be able to reproduce' and outline steps before proposing solution, then we just start it like any other issue, and double the price until someone first figures out how to reproduce and proposes a solution. Whatcha think?

@stephanieelliott
Copy link
Contributor

I like that idea @mallenexpensify! I think it could be worth at least trying. Do you think just changing the title and giving it a shout in #expensify-open-source would be the best way to start?

@quinthar
Copy link
Contributor

FYI this is 100% reproducible for me: just leave E.cash open in a tab while you are doing something else, without touching it or doing anything at all, with the console open. After a few minutes the log will be full of these lines:

image

@quinthar
Copy link
Contributor

Again:
image

@marcaaron
Copy link
Contributor Author

So I left my chat open for a bit to see if this was happening and it did, but not with the same frequency and not in an infinite loop or anything.

Looked at the network request logs and noticed a few weird things...

  1. There are some calls to Get&returnValueList=reportStuff with an undefined value for reportIDList (doesn't seem like there is any reason to call this)
  2. We are logging to the console when there is no IOU report for a chat, but this isn't really an error.

Probably we don't need to log this. But also, why are we calling GetIOUReport for every single report?
Since adding Get&returnValueList=reportSummaryList we should be able to tell if a report has an IOU report or not before calling this which seems like a good idea.

Anyways, I don't think that has anything to do with this issue.

I checked my logs and saw lots of these

66efcd6e7a9af4c4-SJC | www4.rno | 2021-07-15 03:11:16 296 | marc@expensifail.com | [info] [NetworkConnection] Firing reconnection callbacks because timer clock skewed ~~ parameter: '[]' userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
66efcf0d2983f4c4-LAX | www3.lax | 2021-07-15 03:12:22 668 | marc@expensifail.com | [info] [NetworkConnection] Firing reconnection callbacks because timer clock skewed ~~ parameter: '[]' userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
66efd06c795bf4c4-LAX | www4.lax | 2021-07-15 03:13:18 866 | marc@expensifail.com | [info] [NetworkConnection] Firing reconnection callbacks because timer clock skewed ~~ parameter: '[]' userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'

So yes, seems like sleep timer stuff is not working correctly and the code for this is here.

@stephanieelliott
Copy link
Contributor

Sounds like we've figured out how to reproduce, I am going to move this through triage on to Engineering!

@MelvinBot
Copy link

Triggered auto assignment to @alex-mechler (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

@alex-mechler alex-mechler added the External Added to denote the issue can be worked on by a contributor label Jul 16, 2021
@MelvinBot
Copy link

Triggered auto assignment to @mallenexpensify (External), see https://stackoverflow.com/c/expensify/questions/8582 for more details.

@MelvinBot MelvinBot added Daily KSv2 and removed Weekly KSv2 labels Jul 16, 2021
@alex-mechler alex-mechler removed their assignment Jul 16, 2021
@MelvinBot MelvinBot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Jul 17, 2021
@NikkiWines
Copy link
Contributor

Hi @qiweiii, could you expand upon your proposal a little bit more, including some more specifics about what changes you'd be making?

@marcaaron
Copy link
Contributor Author

The original solution is loosely based on this SO post. And there's another suggestion to put the logic inside a web worker so that might be an option.

@qiweiii
Copy link

qiweiii commented Jul 20, 2021

Hi @NikkiWines , I will create another file under lib/SleepTimer for the web worker, it will listen for 'start' and 'stop' messages, on 'start', it will start the setInterval, on 'stop', it will clearInterval, it also posts 'skewed' message when time is skewed.

This web worker will be used in addClockSkewListener function in index.js, the callback passed in addClockSkewListener will be called on 'skewed' message.

@quinthar
Copy link
Contributor

Out of curiosity, for my own benefit, what is this timer for? What are we doing on a recurring basis -- I would think we'd only do this when we reconnect, and not do anything on a polling cycle?

@marcaaron
Copy link
Contributor Author

It provides a way to detect when you've opened a laptop or come back from sleep. IIRC when it was added we were relying on NetInfo to tell us that you had went from "offline" to "online" (the state change triggers a reconnect event). So you'd be "online" then close your laptop and NetInfo would not register you as "offline". When you open your laptop again we won't register that happening at all because all the app knows is that you were "online" and you're still "online". That was a while back so we could try removing it and testing to see if it's still needed. Or think of a better way to avoid this problem entirely.

@quinthar
Copy link
Contributor

quinthar commented Jul 20, 2021 via email

@marcaaron
Copy link
Contributor Author

Definitely. Here's an explanation + a theory on what is happening:

How the sleep timer works

  1. The timer ticks away and records a time when the computer isn't asleep
  2. If the computer goes to sleep then the timer can't record the the time anymore
  3. When the computer comes back from sleep we compare the last value we have with the next one we are able to record
  4. If it's past the threshold for the tick interval then we know that the computer has awakened and we treat that like a reconnection event.

Now as for why this might not be reliable

JS timers themselves are not perfectly reliable. Things can potentially block the single JS thread and cause the timer to record inaccurate values. This provides a possible explanation for why this is happening inconsistently and only for some people. We don't know for sure if this is happening or if it is what exactly is blocking the JS thread so badly that the timer thinks your laptop was asleep for 8 seconds.

@quinthar
Copy link
Contributor

quinthar commented Jul 20, 2021 via email

@kp17211

This comment has been minimized.

@quinthar
Copy link
Contributor

quinthar commented Jul 21, 2021 via email

@marcaaron
Copy link
Contributor Author

Thanks @kp17211, but I'm going to mark your comment as off-topic to avoid confusion.

To be clear, this has nothing to do with a heartbeat or Pusher.

@marcaaron
Copy link
Contributor Author

I'm not sure why we would trigger a major refresh if Pusher still thinks it's connected?

One thing to keep in mind is that we improved the API logic for this so the refresh that occurs is more efficient than it was previously (this change is still on staging but insanely fast by comparison).

Or is the issue that Pusher thinks it is connected for too long after the laptop is resumed after sleep, so we don't actually trust it?

I'm not positive so this will require investigation, but I believe Pusher might think it's connected before and after we go to sleep, but isn't able to receive updates while we sleep. So there is potential for data loss during that time.

@quinthar
Copy link
Contributor

I believe Pusher might think it's connected before and after we go to sleep, but isn't able to receive updates while we sleep. So there is potential for data loss during that time.

This just feels very unlikely to me -- this is an issue that would affect like 100% of Pusher customers, so it really seems like they would view this as a bug (or architectural flaw) and fix it. Have you confirmed this with them?

@marcaaron
Copy link
Contributor Author

marcaaron commented Jul 21, 2021

My memory is hazy, but I want to say that we were not relying on Pusher's reconnect events when this was created. So, I think the important thing to agree on is the possibility that we may not need this code at all. And whoever picks this up should first verify that then we can look into web workers.

Edit: Or complain to whatever we think is causing the real problem

@RushirajJhala
Copy link

I think issue is due to addClockSkewListener
if any api take more time than 8000ms (set in above function)
due to any reason like, internet connection speed is slow , server is slow or more data or parsing time is more etc..
than it will set isSkewed = true and call fetch api again.

Simple solution is increase waiting time
else we can also put check and make sure it don't call again if any old request is in progress

@marcaaron
Copy link
Contributor Author

Ok, I tested this a bit more thoroughly and it seems that Pusher should handle the case where we are reconnecting when the laptop was closed then open. Let's just remove this code for now that is causing problems. If we run into any issues then we can raise that with Pusher.

@marcaaron marcaaron changed the title Calls to fetchAllActions() are made over and over again Remove SleepTimer as it is error prone and rely on Pusher instead Jul 22, 2021
@marcaaron marcaaron self-assigned this Jul 22, 2021
@MelvinBot MelvinBot removed the Overdue label Jul 22, 2021
@marcaaron marcaaron added Reviewing Has a PR in review and removed Exported External Added to denote the issue can be worked on by a contributor Help Wanted Apply this label when an issue is open to proposals by contributors labels Jul 22, 2021
@NikkiWines NikkiWines removed their assignment Jul 23, 2021
@RushirajJhala
Copy link

RushirajJhala commented Jul 23, 2021 via email

@MelvinBot
Copy link

@mallenexpensify, @marcaaron Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

@marcaaron
Copy link
Contributor Author

This was deployed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Daily KSv2 Engineering Reviewing Has a PR in review
Projects
None yet
Development

No branches or pull requests

10 participants