Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Don't reset retry timers on "valid" error codes #16221

Merged
merged 5 commits into from
Sep 4, 2023

Conversation

erikjohnston
Copy link
Member

Basically, we have a bunch of servers that return 404 or 403 etc to all requests. Those error codes are valid for some APIs, which ends up resetting the retry timers even though the server is still down. We fix this by not resetting the retry timer for those cases. This is fine as if the server has come back up, a subsequent request will get a 200 and reset the timers.

elif valid_err_code:
# We got a potentially valid error code back. We don't reset the
# timers though, as the other side might actually be down anyway
# (e.g. some deprovisioned servers will always return a 404 or 403,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: would be nice if decommissioned servers could send us 410 Gone so that we would never bother to retry sending to them (unless they send us traffic in the future).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: would be nice if decommissioned servers could send us 410 Gone so that we would never bother to retry sending to them (unless they send us traffic in the future).

Now this would be awesome best "workaround" for servers that were "pulled the plug unsafely" and have no means/desires to build matrix server again but wants to be good citizen after the fact. Simple idea, helps what it can.

Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing you've had the chance to test this on jki.re?

@erikjohnston
Copy link
Member Author

Guessing you've had the chance to test this on jki.re?

Indeed! It's helped a bunch

@erikjohnston erikjohnston merged commit f84baec into develop Sep 4, 2023
37 checks passed
@erikjohnston erikjohnston deleted the erikj/dont_reset_retry_on_403 branch September 4, 2023 13:04
DMRobertson pushed a commit that referenced this pull request Sep 5, 2023
- Add configuration setting for CAS protocol version. Contributed by Aurélien Grimpard. ([\#15816](#15816))
- Suppress notifications from message edits per [MSC3958](matrix-org/matrix-spec-proposals#3958). ([\#16113](#16113))
- Return a `Retry-After` with `M_LIMIT_EXCEEDED` error responses. ([\#16136](#16136))
- Add `last_seen_ts` to the [admin users API](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html). ([\#16218](#16218))
- Improve resource usage when sending data to a large number of remote hosts that are marked as "down". ([\#16223](#16223))

- Fix IPv6-related bugs on SMTP settings, adding groundwork to fix similar issues. Contributed by @evilham and @telmich (ungleich.ch). ([\#16155](#16155))
- Fix a spec compliance issue where requests to the `/publicRooms` federation API would specify `include_all_networks` as a string. ([\#16185](#16185))
- Fix inaccurate error message while attempting to ban or unban a user with the same or higher PL by spliting the conditional statements. Contributed by @leviosacz. ([\#16205](#16205))
- Fix a rare bug that broke looping calls, which could lead to e.g. linearly increasing memory usage. Introduced in v1.90.0. ([\#16210](#16210))
- Fix a long-standing bug where uploading images would fail if we could not generate thumbnails for them. ([\#16211](#16211))
- Fix a long-standing bug where we did not correctly back off from servers that had "gone" if they returned 4xx series error codes. ([\#16221](#16221))

- Update links to the [matrix.org blog](https://matrix.org/blog/). ([\#16008](#16008))
- Document which [admin APIs](https://matrix-org.github.io/synapse/latest/usage/administration/admin_api/index.html) are disabled when experimental [MSC3861](matrix-org/matrix-spec-proposals#3861) support is enabled. ([\#16168](#16168))
- Document [`exclude_rooms_from_sync`](https://matrix-org.github.io/synapse/v1.92/usage/configuration/config_documentation.html#exclude_rooms_from_sync) configuration option. ([\#16178](#16178))

- Prepare unit tests for Python 3.12. ([\#16099](#16099))
- Fix nightly CI jobs. ([\#16121](#16121), [\#16213](#16213))
- Describe which rate limiter was hit in logs. ([\#16135](#16135))
- Simplify presence code when using workers. ([\#16170](#16170))
- Track per-device information in the presence code. ([\#16171](#16171), [\#16172](#16172))
- Stop using the `event_txn_id` table. ([\#16175](#16175))
- Use `AsyncMock` instead of custom code. ([\#16179](#16179), [\#16180](#16180))
- Improve error reporting of invalid data passed to `/_matrix/key/v2/query`. ([\#16183](#16183))
- Task scheduler: add replication notify for new task to launch ASAP. ([\#16184](#16184))
- Improve type hints. ([\#16186](#16186), [\#16188](#16188), [\#16201](#16201))
- Bump black version to 23.7.0. ([\#16187](#16187))
- Log the details of background update failures. ([\#16212](#16212))
- Cache device resync requests over replication. ([\#16241](#16241))

* Bump anyhow from 1.0.72 to 1.0.75. ([\#16141](#16141))
* Bump furo from 2023.7.26 to 2023.8.19. ([\#16238](#16238))
* Bump phonenumbers from 8.13.18 to 8.13.19. ([\#16237](#16237))
* Bump psycopg2 from 2.9.6 to 2.9.7. ([\#16196](#16196))
* Bump regex from 1.9.3 to 1.9.4. ([\#16195](#16195))
* Bump ruff from 0.0.277 to 0.0.286. ([\#16198](#16198))
* Bump sentry-sdk from 1.29.2 to 1.30.0. ([\#16236](#16236))
* Bump serde from 1.0.184 to 1.0.188. ([\#16194](#16194))
* Bump serde_json from 1.0.104 to 1.0.105. ([\#16140](#16140))
* Bump types-psycopg2 from 2.9.21.10 to 2.9.21.11. ([\#16200](#16200))
* Bump types-pyyaml from 6.0.12.10 to 6.0.12.11. ([\#16199](#16199))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants