-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Don't reset retry timers on "valid" error codes #16221
Conversation
elif valid_err_code: | ||
# We got a potentially valid error code back. We don't reset the | ||
# timers though, as the other side might actually be down anyway | ||
# (e.g. some deprovisioned servers will always return a 404 or 403, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: would be nice if decommissioned servers could send us 410 Gone
so that we would never bother to retry sending to them (unless they send us traffic in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: would be nice if decommissioned servers could send us
410 Gone
so that we would never bother to retry sending to them (unless they send us traffic in the future).
Now this would be awesome best "workaround" for servers that were "pulled the plug unsafely" and have no means/desires to build matrix server again but wants to be good citizen after the fact. Simple idea, helps what it can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guessing you've had the chance to test this on jki.re?
Indeed! It's helped a bunch |
- Add configuration setting for CAS protocol version. Contributed by Aurélien Grimpard. ([\#15816](#15816)) - Suppress notifications from message edits per [MSC3958](matrix-org/matrix-spec-proposals#3958). ([\#16113](#16113)) - Return a `Retry-After` with `M_LIMIT_EXCEEDED` error responses. ([\#16136](#16136)) - Add `last_seen_ts` to the [admin users API](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html). ([\#16218](#16218)) - Improve resource usage when sending data to a large number of remote hosts that are marked as "down". ([\#16223](#16223)) - Fix IPv6-related bugs on SMTP settings, adding groundwork to fix similar issues. Contributed by @evilham and @telmich (ungleich.ch). ([\#16155](#16155)) - Fix a spec compliance issue where requests to the `/publicRooms` federation API would specify `include_all_networks` as a string. ([\#16185](#16185)) - Fix inaccurate error message while attempting to ban or unban a user with the same or higher PL by spliting the conditional statements. Contributed by @leviosacz. ([\#16205](#16205)) - Fix a rare bug that broke looping calls, which could lead to e.g. linearly increasing memory usage. Introduced in v1.90.0. ([\#16210](#16210)) - Fix a long-standing bug where uploading images would fail if we could not generate thumbnails for them. ([\#16211](#16211)) - Fix a long-standing bug where we did not correctly back off from servers that had "gone" if they returned 4xx series error codes. ([\#16221](#16221)) - Update links to the [matrix.org blog](https://matrix.org/blog/). ([\#16008](#16008)) - Document which [admin APIs](https://matrix-org.github.io/synapse/latest/usage/administration/admin_api/index.html) are disabled when experimental [MSC3861](matrix-org/matrix-spec-proposals#3861) support is enabled. ([\#16168](#16168)) - Document [`exclude_rooms_from_sync`](https://matrix-org.github.io/synapse/v1.92/usage/configuration/config_documentation.html#exclude_rooms_from_sync) configuration option. ([\#16178](#16178)) - Prepare unit tests for Python 3.12. ([\#16099](#16099)) - Fix nightly CI jobs. ([\#16121](#16121), [\#16213](#16213)) - Describe which rate limiter was hit in logs. ([\#16135](#16135)) - Simplify presence code when using workers. ([\#16170](#16170)) - Track per-device information in the presence code. ([\#16171](#16171), [\#16172](#16172)) - Stop using the `event_txn_id` table. ([\#16175](#16175)) - Use `AsyncMock` instead of custom code. ([\#16179](#16179), [\#16180](#16180)) - Improve error reporting of invalid data passed to `/_matrix/key/v2/query`. ([\#16183](#16183)) - Task scheduler: add replication notify for new task to launch ASAP. ([\#16184](#16184)) - Improve type hints. ([\#16186](#16186), [\#16188](#16188), [\#16201](#16201)) - Bump black version to 23.7.0. ([\#16187](#16187)) - Log the details of background update failures. ([\#16212](#16212)) - Cache device resync requests over replication. ([\#16241](#16241)) * Bump anyhow from 1.0.72 to 1.0.75. ([\#16141](#16141)) * Bump furo from 2023.7.26 to 2023.8.19. ([\#16238](#16238)) * Bump phonenumbers from 8.13.18 to 8.13.19. ([\#16237](#16237)) * Bump psycopg2 from 2.9.6 to 2.9.7. ([\#16196](#16196)) * Bump regex from 1.9.3 to 1.9.4. ([\#16195](#16195)) * Bump ruff from 0.0.277 to 0.0.286. ([\#16198](#16198)) * Bump sentry-sdk from 1.29.2 to 1.30.0. ([\#16236](#16236)) * Bump serde from 1.0.184 to 1.0.188. ([\#16194](#16194)) * Bump serde_json from 1.0.104 to 1.0.105. ([\#16140](#16140)) * Bump types-psycopg2 from 2.9.21.10 to 2.9.21.11. ([\#16200](#16200)) * Bump types-pyyaml from 6.0.12.10 to 6.0.12.11. ([\#16199](#16199))
Basically, we have a bunch of servers that return 404 or 403 etc to all requests. Those error codes are valid for some APIs, which ends up resetting the retry timers even though the server is still down. We fix this by not resetting the retry timer for those cases. This is fine as if the server has come back up, a subsequent request will get a 200 and reset the timers.