Running redis mode (on develop) causes high CPU usage #7334

Half-Shot · 2020-04-23T10:24:41Z

The CPU graph for my set of workers:

The GC graph:

Something has changed between 1.12.3 and 2e3b9a0 that has caused this to skyrocket.

Otherwise, the homeserver seems to work in terms of federating and sending messages. It's just obsessed with GCing right now.

Memory usage has remained a bit low:

The text was updated successfully, but these errors were encountered:

Half-Shot · 2020-04-23T10:27:25Z

CPU usage by requests:

erikjohnston · 2020-04-23T10:34:43Z

I suspect its due to the fact that it seems to be sending out about 1kHz of REMOTE_SERVER_UP commands from each process. Possibly there is a loop going on?

Half-Shot · 2020-04-23T10:42:49Z

Switching off Redis (but running the same commit) seemed to reduce CPU, and stop spamming REMOTE_SERVER_UP.

erikjohnston · 2020-04-27T09:11:52Z

I believe the cause of this is if the fact in current TCP mode if a worker detects a remote server has come back online it sends a REMOTE_SERVER_UP to master, which then proxies to other workers. When running with redis the master process still echoes the command back, which leads to an infinite loop as redis will echo it back to master again.

For direct TCP connections we need the master to relay REMOTE_SERVER_UP commands to the other connections so that all instances get notified about it. The old implementation just relayed to all connections, assuming that sending back to the original sender of the command was safe. This is not true for redis, where commands sent get echoed back to the sender, which was causing master to effectively infinite loop sending and then re-receiving REMOTE_SERVER_UP commands that it sent. The fix is to ensure that we only relay to *other* connections and not to the connection we received the notification from. Fixes #7334.

Half-Shot · 2020-04-28T17:45:43Z

I can confirm #7352 fixes the issue for me.

For direct TCP connections we need the master to relay REMOTE_SERVER_UP commands to the other connections so that all instances get notified about it. The old implementation just relayed to all connections, assuming that sending back to the original sender of the command was safe. This is not true for redis, where commands sent get echoed back to the sender, which was causing master to effectively infinite loop sending and then re-receiving REMOTE_SERVER_UP commands that it sent. The fix is to ensure that we only relay to *other* connections and not to the connection we received the notification from. Fixes #7334.

For direct TCP connections we need the master to relay REMOTE_SERVER_UP commands to the other connections so that all instances get notified about it. The old implementation just relayed to all connections, assuming that sending back to the original sender of the command was safe. This is not true for redis, where commands sent get echoed back to the sender, which was causing master to effectively infinite loop sending and then re-receiving REMOTE_SERVER_UP commands that it sent. The fix is to ensure that we only relay to *other* connections and not to the connection we received the notification from. Fixes matrix-org#7334.

richvdh added the operation gemini label Apr 27, 2020

erikjohnston mentioned this issue Apr 27, 2020

Don't relay REMOTE_SERVER_UP cmds to same conn. #7352

Merged

richvdh closed this as completed Apr 29, 2020

richvdh added the A-Workers Problems related to running Synapse in Worker Mode (or replication) label Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running redis mode (on develop) causes high CPU usage #7334

Running redis mode (on develop) causes high CPU usage #7334

Half-Shot commented Apr 23, 2020

Half-Shot commented Apr 23, 2020

erikjohnston commented Apr 23, 2020

Half-Shot commented Apr 23, 2020

erikjohnston commented Apr 27, 2020

Half-Shot commented Apr 28, 2020

Running redis mode (on develop) causes high CPU usage #7334

Running redis mode (on develop) causes high CPU usage #7334

Comments

Half-Shot commented Apr 23, 2020

Half-Shot commented Apr 23, 2020

erikjohnston commented Apr 23, 2020

Half-Shot commented Apr 23, 2020

erikjohnston commented Apr 27, 2020

Half-Shot commented Apr 28, 2020