-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Allow configuring maximum interval between AS transaction retries #12685
Conversation
Signed-off-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>
Signed-off-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>
Signed-off-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>
@tadzik Looks like some of the unit tests are failing here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly looks good, thanks. Could you add an entry to the config manual for this option, please? The source is https://github.com/matrix-org/synapse/blob/develop/docs/usage/configuration/config_documentation.md
synapse/config/appservice.py
Outdated
|
||
# Set to establish maximum backoff (in seconds) between HS -> AS connection attempts. | ||
# Upon failing to push appservice events the homeserver will wait | ||
# an increasing amount of seconds between retries. This sets an upper limit of that (in seconds) | ||
# | ||
#appservice_max_backoff: 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explicitly say what the default value/behaviour is, if this config key is not provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done now in 727bbf8
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
Co-authored-by: David Robertson <david.m.robertson1@gmail.com>
This seems to be duplicating information from the sample config – for appservice configs it seems to be copied from it almost verbatim. If you didn't tell me otherwise, I'd assume it was autogenerated this way :) Is there any plan/effort to make it generated, or does it serve some broader purpose? Is it expected/acceptable to just copypaste the description from config sample? |
I had wondered this too. I don't know what the idea is there; perhaps @H-Shay or @anoadragon453 have a plan? See also #12368. |
We're in a bit of an awkward in-between state now. The referenced PR mentions then pulling out the documentation from the config file, so that we're left with a minimal configuration file, where people can add more options at their leisure. #8159 is the original issue for this, and we should take further discussion of it there. |
The broader purpose is for the manual to replace the comments in the config, so that the config becomes a more manageable file and the config documentation is indexed, searchable, and easier to read and link to. Right now it's totally fine to copy/paste your description from the config sample into the config manual as long as you follow the formatting in the manual. If all goes to plan in the future there will only be the config manual to update, no comments in the config so hopefully that will be a little clearer! |
docs/sample_config.yaml
Outdated
# Set to establish maximum backoff (in seconds) between HS -> AS connection attempts. | ||
# Upon failing to push appservice events the homeserver will wait | ||
# an increasing amount of seconds between retries. This sets an upper limit of that (in seconds) | ||
# | ||
#appservice_max_backoff: 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem to say what the default behaviour is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that pretty much everything in the config file which is a duration is in milliseconds, so having a setting in seconds is confusing. Consider making it take a suffix like "s" for seconds, for consistency with all the other time period settings.
please could this PR be updated with a bit more information about why this is a useful thing to have? Generally we should strive to have solutions that work for everyone, rather than requiring users to configure everything, so I'd like to understand why that isn't possible. question: should this go in the per-appservice config file rather than the main one? (a) the main config file already has quite enough options, thank you; (b) depending on why this is needed, it might be better to have different settings for different ASes? |
Alright – done now in 08f7d32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise I think this makes sense
# | ||
# Regardless of this setting, the delay will never be longer than 512 seconds (about 8.5 minutes). | ||
# | ||
#appservice_max_backoff_s: 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this take a duration please? I.e. appservice_max_backoff: 60s
instead. When reading the config you can use self.parse_duration(...)
.
Config option: `appservice_max_backoff_s` | ||
|
||
Set to establish a maximum backoff (in seconds) between HS -> AS connection attempts. | ||
Upon failing to push appservice events, the homeserver will reattempt connection to the | ||
application service after a delay. The delay increases with subsequent retries. | ||
This value sets an upper limit on that delay. | ||
|
||
Regardless of this setting, the delay will never be longer than 512 seconds (about 8.5 minutes), | ||
which is the default behaviour if this option is not set. | ||
|
||
Example configuration: | ||
```yaml | ||
appservice_max_backoff_s: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you misunderstood my earlier comment. The name of the setting should not have an s
suffix, rather the value of the setting should be able to take a suffix (such as s
for seconds, m
for minutes), and if no suffix is present, it should be in milliseconds. That will make it consistent with other durations in the config file.
see for example redaction_retention_period
.
@@ -2292,6 +2292,21 @@ Example configuration: | |||
track_appservice_user_ips: true | |||
``` | |||
--- | |||
Config option: `appservice_max_backoff_s` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I saw an answer on this before, so I'll ask again:
should this go in the per-appservice config file rather than the main one? (a) the main config file already has quite enough options; (b) depending on why this is needed, it might be better to have different settings for different ASes?
What's the status of this? Is there still merit in having this option? AFAICS the outstanding points are:
(Is there a need for an option at all? Can we do something else which e.g. resets the backoff timer? E.g. a heartbeat/ping-pong mechanism between HS and AS for each to be convinced that the other is alive?) |
I think the merit is still there, though in ideal world I'd expect this to Just Work Reliably, without needing to configure more aggressive polling.
Perhaps that's what we're missing. Most bridges I'm aware of do some variant of pinging Synapse on startup (usually checking if its user is registered), which should be a clear enough signal for Synapse that the AS is alive and worth talking to. I'm happy to close this in favour of some future AS-request-resets-backoff-timer PR. |
Closing as per my last comment. |
Reducing this backoff period comes in handy in situations when the service (synapse and appservice) spawn order is hard to control (e.g. in k8s environments, where Synapse and Appservices would be (re)started concurrently, often with wildly different startup times) and in integration tests, where starting up and tearing down of the HS and appservices happens a lot, and every additional second of backoff contributes significant amounts of wasted time.
Essentially, I'm looking for a "low-latency" mode, where currently it's very common to see bridges being initially unresponsive due to AS backoff, or integration tests taking forever because Homerunner has to come up before the AS (to know what its URL actually is once it's passed to AS itself), and yet it gets quickly "bored" of trying to talk to the AS itself, resulting in tests taking forever.
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)