Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heartbeat detection is delayed when timers throttled, and messages can be sent over expired connection #5135

Open
tjenkinson opened this issue Jul 15, 2024 · 3 comments
Labels
to triage Waiting to be triaged by a member of the team

Comments

@tjenkinson
Copy link

Describe the bug

Due to chrome (and maybe other browsers) throttling timers when the device sleeps, the heartbeat detection logic doesn't always run in time, causing messages to be lost before reconnection happens late.

To Reproduce

Socket.IO server version: 4.7.5

Server

import { Server } from "socket.io";

const io = new Server(3000, { pingTimeout: 5_000, pingInterval: 25_000});

io.on("connection", (socket) => {
  console.log(`connect ${socket.id}`);

  socket.on("disconnect", () => {
    console.log(`disconnect ${socket.id}`);
  });
});

Socket.IO client version: 4.7.5

Client

import { io } from "socket.io-client";

// use a tunnel like the ports feature in vscode to make the connection work over the internet and use that host here
const socket = io("ws://<replace with host>:3000/", {});

socket.on("connect", () => {
  console.log(`connect ${socket.id}`, new Date());
});

socket.on("disconnect", () => {
  console.log("disconnect", new Date());
});

Expected behavior

  • Load the page and wait for the connected message
  • Close the laptop with it unplugged so that it goes to sleep
  • Wait 60 seconds or so to make sure it's gone to sleep for a bit
  • Open the laptop and go back to the page
  • The latest connection should not have received a ping in the last 30 seconds, and therefore it should be closed. But because chrome froze the timer that's going to do the disconnection it will take a while longer before that actually happens
  • If a message is sent before the reconnection it will probably go into the void and be lost

Proposed fixes

Check if connection is expired before sending every message

I opened a PR for this here: #5134

Let me know what you think

This doesn't cause the reconnect to happen immediately when the page is woken up, but it does mean attempting to send a message should trigger the reconnection, so outgoing messages shouldn't get lost.

Use setInterval and periodically check if a ping has not been received instead of setTimeout

I think this and storing the Date.now() of the last ping should be more accurate, and cause the reconnection to happen almost immediately when the page is woken up. Can also do the same on a visibilitychange event when the page goes back to visible.

Can look into a PR for this if you think it makes sense?

Platform:

  • Device: Mac
  • Browser: Chrome
  • OS: Mac OS 14.5 (23F79)

Additional context

#3507 (comment) is another fix in the past related to throttled timers

@tjenkinson tjenkinson added the to triage Waiting to be triaged by a member of the team label Jul 15, 2024
@darrachequesne
Copy link
Member

Hi! That's an interesting idea, thanks for the detailed analysis 👍

However, I'm wondering whether handling such a corner case is really worth it. I don't think we will be able to handle all cases where the client fails to send a packet. If one really want a packet to be received by the server, aren't retries sufficient?

Reference: https://socket.io/docs/v4/client-options/#retries

@tjenkinson
Copy link
Author

Hey @darrachequesne

The problem with retries is that it requires that you wait for the ack, and also with retrying you need to make sure you either are ok processing duplicate messages or are including some kind of identifier to make sure you can handle duplicate messages on the server, which is not trivial. I.e. the connection might not have already been dead, it might have died after the message was sent but before the response had chance to come back.

I think the goal of the heartbeat logic is to detect when a socket is broken before a real message goes down it? So this does feel like a problem with that to me. Adding retries feels like it's working around a bug in that system.

I understand without retries you're getting "at most once delivery" and can expect messages to get lost, but if we improve the heartbeat logic this should be a lot more reliable. Before we implemented a workaround in our app (which I'm hoping we can remove if we can get a solution in socket.io at some point), we had reports that people would open there laptop and return to our app, then try and perform an action, which would hang indefinitely.

I think we shouldn't need to require users to move from "at most once" to "at least once" delivery to solve this problem.

This is the workaround we have for now:

// these need to match the config on the server
const pingInterval = 25000;
const pingTimeout = 20000;

function detectSocketDisconnection(socket) {
  let lastPingTime = null
  const onPing = () => {
    lastPingTime = Date.now()
  }

  const checkForDisconnection = () => {
    if (lastPingTime === null) return

    const connectionExpired =
      Date.now() - lastPingTime > pingInterval + pingTimeout

    if (connectionExpired) {
      console.log('Forcing disconnection...')
      lastPingTime = null
      socket.io.engine.close()
    }
  }

  const originalEmit = socket.emit
  socket.emit = (...args) => {
    // This may cause a disconnect, which will then cause the actual emit message to be
    // buffered and sent down the new connection when it's set up
    checkForDisconnection()
    return originalEmit.apply(socket, args)
  }

  setInterval(() => checkForDisconnection(), 5000)

  window.addEventListener('visibilitychange', () => {
    if (document.visibilityState === 'visible') {
      checkForDisconnection()
    }
  })

  socket.io.on('ping', onPing)
  socket.on('connect', onPing)
}

const socket = io()
detectSocketDisconnection(socket)

@darrachequesne
Copy link
Member

@tjenkinson OK, that totally makes sense 👍

I've added a minor suggestion on the PR: #5134

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to triage Waiting to be triaged by a member of the team
Projects
None yet
Development

No branches or pull requests

2 participants