-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
synapse 1.47 big jump in load due to remove_hidden_devices_from_device_inbox
#11401
Comments
There is a background job to clean hidden devices from |
Would that be:
|
This job for deleted devices runs after. The job for hidden devices is
|
The last time such a change was published which was expected to produce unexpected load, it was documented, |
2 days later, still have elevated load from this specific process |
remove_hidden_devices_from_device_inbox
we've observed a similar problem on one of the hosts on EMS.
unfortunately this unexpected load is unexpected. |
I had some success by reducing @matrix-org/synapse-core: any reason not to reduce |
The reason for having a limit is to make sure decent amount of progress is made at each step, with a conscious trade off that its better to consume more resources and have the background update finish in a sensible time frame than effectively never have it finish at all. I don't have a particular objection with removing the minimum, but I do worry its going to not help that much if the underlying queries are sloooooooooow. In this particular case it looks like the query is going slow:
I think due to the DB having very few hidden devices, so its walking many rows of the |
If we have a slow query which needs optimising, it feels like it's better that it goes slowly (and we can optimise it in the next release) than that it takes out the entire homeserver by forging on and doing 100 rows anyway.
That's a good idea, though I think we might have a similar problem with |
Tasks to do to resolve this: (edit, see PR description) This is with @babolivier; he has the details of ideas for how to achieve them :) |
Ah yeah I meant to close it but forgot, thanks. |
The load finally stopped and I see the queries have been optimized. |
Edit: Tracking expected mitigations in:
remove_{hidden,deleted}_devices_from_device_inbox
#11421See discussion below for context
Description
After upgrading to synapse-1.47, via https://github.com/spantaleev/matrix-docker-ansible-deploy, my server experienced a huge increase in both CPU and IO load.
Upon examining the synapse prometheus/grafana stats, I found the following entires:
A large increase in master_0_background_updates in DB transactions
And in particular, master-0_remove_hidden_device_from_inbox
It may be related, that the HTTP pusher distribution also changed oddly:
Steps to reproduce
Version information
Version: 1.47
Install method: https://github.com/spantaleev/matrix-docker-ansible-deploy
Platform: Ubuntu 20.04.3
The text was updated successfully, but these errors were encountered: