Performance Regression in 1.63 #13331

davidmehren · 2022-07-19T19:27:42Z

Description

After updating to Synapse 1.63.0 today, one of my Synapse instances experiences a noticeable performance regression:

According to the metrics (snapshot at https://fsr-ops.cs.tu-dortmund.de/dashboard/snapshot/PyUD0nsC3zYGM3AyCOrwWOOWTbdRsXZo?orgId=0) handle_new_client_event and action_for_event_by_user now consume significantly more CPU and database resources.

#13100 and #13078 seem to have touched these functions for 1.63, so maybe they are the culprit?

Steps to reproduce

Update to Synapse 1.63.0
Observe worse performance in Grafana

Homeserver

fachschaften.org

Synapse Version

1.63.0

Installation Method

Docker (matrixdotorg/synapse)

Platform

Docker on Ubuntu 20.04 in LXC

Relevant log output

I didn't see something relevant in the log output (and there is just too much to paste everything) and the Grafana snapshot is hopefully more helpful than a huge amount of logs.

Anything else that would be useful to know?

No response

The text was updated successfully, but these errors were encountered:

squahtx · 2022-07-19T19:35:06Z

I wonder if it's the default size of the get_user_in_room_with_profile cache being too small. Does the "Top 10 cache misses" chart look any different before and after the upgrade?

davidmehren · 2022-07-19T19:42:58Z

@squahtx that looks like a good guess!

squahtx · 2022-07-19T19:57:38Z

The size of that cache can be controlled by adding an entry to per_cache_factors in the config. The config can then be reloaded by sending SIGHUP to Synapse.

It sounds like the default size for that cache is too small and we should ship with a larger default. I'd be interested in seeing what a good cache factor for your deployment turns out to be.

davidmehren · 2022-07-19T20:06:11Z

I doubled the cache factor from our global default of 1 to 2 and will observe the metrics for a bit. Thanks for the prompt help!

davidmehren · 2022-07-20T13:36:03Z

1.63.1 seems to have fixed the problem, load and event send time is back to normal immediately after upgrading.

davidmehren closed this as completed Jul 20, 2022

DMRobertson linked a pull request Jul 20, 2022 that will close this issue

Don't include appservice users when calculating push rules #13332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Regression in 1.63 #13331

Performance Regression in 1.63 #13331

davidmehren commented Jul 19, 2022

squahtx commented Jul 19, 2022

davidmehren commented Jul 19, 2022

squahtx commented Jul 19, 2022

davidmehren commented Jul 19, 2022

davidmehren commented Jul 20, 2022

Performance Regression in 1.63 #13331

Performance Regression in 1.63 #13331

Comments

davidmehren commented Jul 19, 2022

Description

Steps to reproduce

Homeserver

Synapse Version

Installation Method

Platform

Relevant log output

Anything else that would be useful to know?

squahtx commented Jul 19, 2022

davidmehren commented Jul 19, 2022

squahtx commented Jul 19, 2022

davidmehren commented Jul 19, 2022

davidmehren commented Jul 20, 2022