-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Unbounded increases in GC time #11172
Comments
related: #7176 |
Can you zoom in on the Currently allocated objects graph, showing synapse-gen 2? Does synapse-gen 2 trend upwards over time, just like the average GC time? |
I think your 'currently allocated objects' graph is incorrect - it's reporting the number of gen1 GCs since the last gen2 GC, which is why it is cycling in the range 0-10. You should consider updating your Grafana dashboard. Unfortunately we don't expose the total number of active objects as a prometheus metric (I used a local patch to generate the graph in #7176 (comment). Calculating the count requires the python runtime to build a list of all the active objects, which is expensive). The easiest way to get the metric we're after here is probably to do
|
Hello, Richard. I reproduced in my test environment. I run the command - gc.set_debug(gc.DEBUG_STATS) I sampled some logs. 2021-10-302021-10-30 07:59:58,823 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1518 5816 317942021-10-30 07:59:56,699 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1621 5371 31794 2021-10-30 07:59:55,425 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1894 4666 31806 2021-10-30 07:59:54,228 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1636 3792 32428 2021-10-30 07:59:50,618 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1543 2559 33271 2021-10-312021-10-31 07:59:59,830 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1887 4097 1622772021-10-31 07:59:57,592 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1618 2826 163332 2021-10-31 07:59:55,243 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1891 1290 164193 2021-10-31 07:59:52,300 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1637 0 165252 2021-10-31 07:59:50,095 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 9 9189 159962 2021-11-012021-11-01 08:00:00,954 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 980 0 2223272021-11-01 07:59:59,865 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 4 0 243758 2021-11-01 07:59:59,635 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 19 8526 237368 2021-11-01 07:59:59,586 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1893 6967 237368 2021-11-01 07:59:57,325 - twisted - 258 - ERROR - sentinel- gc: objects in each generation: 1459 7211 237384 Gen2 objects are increased continuously. Something is not collected in garbage. Is there any way to know the something? Actually, I did some customization in sync.py. I'll test the official synapse docker image(1.35.0) to check it has the same problem. Thanks you. |
I wouldn't say 200K objects over a period of several days is a huge number, particularly given your cache configuration settings. Indeed, given your |
Hello, Richard. I changed the cache configuration.
It seems it have an effect. The average gc time is bounded to 1.0 secs even though there are so many requests(/sync, /sendToDevice). I will test 2~3 days more to watch the average gc time. I looked up the synapse code, then I am querious about the cache configuration. event_cache_size
I thought more cache, more performance if machine have enough memory. But, it seems Is it right? Synapse Cache Factor
I thought extending But, it seems it is just for request pool for http client. Almost outbound http request from synapse would go to a push server(=sygnal), and pushing is working at background. So, it seems like `global_factor' is for pusher performance. Is it right? // Best Regards |
This sounds right to me. Accessing memory takes time — if you have more of it, then (making simplifications) it will take more time to access it all in order to perform garbage collection. There is probably a sweet spot for your server between 0.5 and 10.0.
|
I got it. I missed below codes. Now, everything is done. Thanks you very much. Best Regards
|
Glad to help! |
Description
After updating to 1.35.0, Average GC Time is going up a lot more than before, around 0.5 secs per week.
I'm trying to reproduce my test environment.
Do you have any idea to handle this problem?
Additional Synapse Metrics
Environment Information
What version of Synapse is running?
: 1.35.0
: No worker, single instance
Synapse Cache Factor
: 10.0
Event Cache Size
: 100k
Sync(Long Polling) Timeout & Request Timeout in out client
: 10 secs & 15secs
Synapse Users
: Around 3000
Active User Devices
: 1000 Devices
The top 3 API Endpoints
Platform
ubuntu
: 20.04
container
: Customized docker container FROM phusion/baseimage:0.11
: python 3.6
vm (kvm)
: cpu cores: 24, memory: 61GB
The text was updated successfully, but these errors were encountered: