-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Presence is increasingly heavy #3971
Comments
Hi @spantaleev thanks for detailed issue. I think you're probably right that we are too keen to send out presence updates to other servers in the room. I don't have an immediate fix, but one thing you might do is check your extremities per #1760 because the problem is that we can end up doing state res for all your rooms to see who's in there, which is expensive if you have lots of extremities |
Thanks for the tip, @neilisfragile! I have checked for forward extremities using: select room_id, count(*) c from event_forward_extremities group by room_id order by c desc limit 20; The top result has a count of Maybe the slowness comes from something else? |
Agreed, 2 doesn't sound like a lot. I can see some conversation in #SynapseAdmins, though it's not clear to me that it solved your problem. Practically speaking I can't say we'll look at presence in detail in the short term - though we certainly are looking at state resolution and perf more generally which is likely to have a knock on effect. At this point the best I can suggest is that you prune the extremities anyway for the reasons that @richvdh highlighted, and see how that affects presence perf if at all. |
Today's conversation in Using the query at the bottom there, generates a completely empty Interesting.. If there are no extremities to delete, it suggests I'm not affected by the extremities problem, and that presence is slow for me for other reasons. |
I just gave it a try with Synapse v1.1.0. Looks like it might be better. It's hard to say though, because I happen to be on a new and faster server now. CPU-wiseWhen presence is disabled, my load15 average would be When presence is enabled, it seems like load15 average is around 0.4-0.5 with normal use. CPU usage hits 100% for a significant amount of time when the presence status changes (foregrounding/backgrounding the app). By foregrounding and backgrounding the app periodically in quick succession, I could consistently keep my server's load average above 2.0 (reaching up to 4.0). Memory-wiseMemory usage is definitely much higher when presence is enabled. Testing method:
This doesn't say how memory usage grows over a long period of time (hours, days).. it definitely does even with presence disabled, but we ignore it here. This is on a 4G RAM server. When presence is disabled, after some 10 minutes of use, memory usage for Synapse keeps stable at 5%. When presence is enabled, memory usage for Synapse quickly jumps from 5% to ~35% and stays there. In summary, I guess the situation might have improved, but:
|
I also have noticed a significant difference in load on the system when presence is enabled. Lots of load happens if I switch back and forth on the app. Same as mentioned above. Ram usage, I'm not sure about, but seems to be higher as well. |
One idea I had, because the presence being so heavy is bugging me: Could presence only be returned if a sync otherwise succeeds, would timeout or a certain time threshold is passed? It doesn't really make sense to always return just one presence update on /sync. In my opinion it should be fine to add up to 30 seconds delay to a presence update, if that is the only update a user would get for presence. If sync returns earlier, because a message was sent for example, it should be easy enough to also flush the presence updates out with it. I know matrix.org doesn't use that feature, but I would really like to be able to use presence, without my server and client going up in flames. |
From personal experience, disabling presence on my homeserver had a significant effect on my overall experience using matrix. /sync's on mobile went from taking 10+ seconds (sometimes timing out at 30 seconds), to happening instantaneously. Since presence seems to be the main source of slow synapse performance, does it make sense to have it be opt-in instead of opt-out? I can't count the number of times I've helped people with their server performance by toggling that setting. |
As workaround, maybe make presence quering rules as per-room basis? For ability to disable presence querying events in large rooms like Matrix HQ, Riot-*, etc. This will be much better, than globally disabling presence! |
Other possible workaround - make option in Synapse to disable sending presence for all federated users, or whitelist of servers to which send presence, and add rate limits (custom incoming and default outgoing) for protect ddos of homeserver from other homeservers. |
This is something I'd love to see, personally. I want presence on my homeservers and on my friends' but I don't want it at all on the big matrix.org rooms. I would also love to be able to do something like disable presence for any room with more than X members. |
Also presence functions don't lock the process for concurrent changes, this throws the error:
|
Yep the Problem is poor on my small Dedicated Server, I love Presence Stuff but the impact is very high for me too with Master Synapse Branch -> Python 3.9, I think Presence only internal Homeserver could solve some problems, so currently it is a global feature, why it's not configureable to internal <-> Federation ? |
Tracking issue with use_presence's performance problem: matrix-org#3971. Signed-off-by: rht <rhtbot@protonmail.com>
We've done a bunch of work in this area over the past few months, so hopefully its improved a bit. I'm sure there is more work that we can do, but I'm going to close this for now |
@erikjohnston can you please describe a little about that improvements, or post link to PRs? |
I'm not sure if #3962 is completely different, but this is something that I've been noticing for a while..
It seems like sending out presence updates whenever my presence changes, would cause 100% CPU usage for a while on my small VPS.
An easy way to trigger it is to use the riot-ios app. Just opening the app and then sending it to the background would cause it to hit
/_matrix/client/r0/presence/{userId}/status
withPUT
requests (either updating presence toonline
or tounavailable
.. and after some more inactivity, tooffline
by a Synapse background task, it seems).Doing that would cause 100% CPU for a while. I imagine Synapse tries to notify many other servers about the presence change. While this (non-important thing) is going on, Synapse would be slow to respond to other requests.
If I just keep alternating between backgrounding and foregrounding the riot-ios app, I can effectively keep my homeserver at 100% CPU.
Normally though, a few seconds after backgrounding the app (which sets my presence as
unavailable
), due to a subsequent foregrounding of the app or due to a/sync
by another client of mine (on desktop or something), my presence status would change back toonline
and cause the same thing once again.Maybe a few things could be investigated:
whether riot-ios should try to set presence as
unavailable
at all, especially given that other clients may be syncing at the same time and telling Synapse I'monline
..even if a given client says
unavailable
, whether the server should accept that, given that other clients (devices) may be syncing and setting another status at the same timewhether the server should be so quick to accept and propagate a presence status, when said status might change once again some couple of seconds later.
/sync
is usually called by clients with a long-polling timeout of 30 seconds, so there usually may be something that re-sets the presence status after as little as 30 seconds. Do federated clients care about sub-30-seconds granularity? Perhaps presence changes can be debounced for 30-60 seconds before actually kicking inwhether propagating presence to other servers should be so heavy. Perhaps the code can be optimized or deprioritized, such that it won't disturb other server operations
Using
use_presence = false
eliminates the problem, at the expense of presence not working.I do like having presence, but I don't care about it being so accurate and so fast to propagate (at the expense of other operations).
The text was updated successfully, but these errors were encountered: