This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Make sure synapse_rate_limit_reject_affected_hosts
does what it says it does
#13670
Labels
A-Federation
A-Metrics
metrics, measures, stuff we put in Prometheus
O-Uncommon
Most users are unlikely to come across this or unexpected workflow
S-Minor
Blocks non-critical functionality, workarounds exist.
T-Task
Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Spawning from #13541 (comment)
The
synapse_rate_limit_reject_affected_hosts
gauge is always evaluating to0
. The raw data in Prometheus also shows0
for reference.https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661855196368&to=1661876796368&viewPanel=220
Even though we see individual requests being rejected (
synapse_rate_limit_reject_total
) which should mean at least1
host,But this could be a mismatch in how the guages were being reported because we were accidentally registering them twice, #13641
Now that we fixed the duplicate metric registering issue in #13649 and the fix was put on
matrix.org
this morning, we're seeing both at0
now. This could mean that the previous rejections we were seeing were all from theUsernameAvailabilityRestServlet
which we are no longer tracking. And we're not rejecting any requests in the federation servlets.It is a bit suspicious though.
How can we know if it's right?
In order to confirm that
synapse_rate_limit_reject_affected_hosts
is working, it would be nice to see a non-zero value.The
reject_limit
is50
which I think means there has to be more than 50 requests within the 1 secondfederation_rc_window_size
to start rejecting.We do see the rate of slept requests go above 70 sometimes which I would expect to trigger this 🤔
https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661873701156&to=1661877301156&viewPanel=223
Dev notes
The
synapse_rate_limit_reject_affected_hosts
metric was originally added in #13541 and updated in #13649The text was updated successfully, but these errors were encountered: