Allow configuring maximum interval between AS transaction retries #12685

tadzik · 2022-05-10T10:54:54Z

Reducing this backoff period comes in handy in situations when the service (synapse and appservice) spawn order is hard to control (e.g. in k8s environments, where Synapse and Appservices would be (re)started concurrently, often with wildly different startup times) and in integration tests, where starting up and tearing down of the HS and appservices happens a lot, and every additional second of backoff contributes significant amounts of wasted time.

Essentially, I'm looking for a "low-latency" mode, where currently it's very common to see bridges being initially unresponsive due to AS backoff, or integration tests taking forever because Homerunner has to come up before the AS (to know what its URL actually is once it's passed to AS itself), and yet it gets quickly "bored" of trying to talk to the AS itself, resulting in tests taking forever.

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Pull request includes a sign off
Code style is correct
(run the linters)

Signed-off-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>

clokep · 2022-05-10T11:47:12Z

@tadzik Looks like some of the unit tests are failing here!

DMRobertson

Broadly looks good, thanks. Could you add an entry to the config manual for this option, please? The source is https://github.com/matrix-org/synapse/blob/develop/docs/usage/configuration/config_documentation.md

synapse/appservice/scheduler.py

DMRobertson · 2022-05-11T10:56:04Z

synapse/config/appservice.py

+
+        # Set to establish maximum backoff (in seconds) between HS -> AS connection attempts.
+        # Upon failing to push appservice events the homeserver will wait
+        # an increasing amount of seconds between retries. This sets an upper limit of that (in seconds)
+        #
+        #appservice_max_backoff: 60


Can you explicitly say what the default value/behaviour is, if this config key is not provided?

Done now in 727bbf8

synapse/config/appservice.py

Co-authored-by: David Robertson <david.m.robertson1@gmail.com>

tadzik · 2022-05-13T10:10:08Z

Broadly looks good, thanks. Could you add an entry to the config manual for this option, please? The source is https://github.com/matrix-org/synapse/blob/develop/docs/usage/configuration/config_documentation.md

This seems to be duplicating information from the sample config – for appservice configs it seems to be copied from it almost verbatim. If you didn't tell me otherwise, I'd assume it was autogenerated this way :)

Is there any plan/effort to make it generated, or does it serve some broader purpose? Is it expected/acceptable to just copypaste the description from config sample?

DMRobertson · 2022-05-13T12:02:45Z

Broadly looks good, thanks. Could you add an entry to the config manual for this option, please? The source is https://github.com/matrix-org/synapse/blob/develop/docs/usage/configuration/config_documentation.md

This seems to be duplicating information from the sample config – for appservice configs it seems to be copied from it almost verbatim. If you didn't tell me otherwise, I'd assume it was autogenerated this way :)

Is there any plan/effort to make it generated, or does it serve some broader purpose? Is it expected/acceptable to just copypaste the description from config sample?

I had wondered this too. I don't know what the idea is there; perhaps @H-Shay or @anoadragon453 have a plan? See also #12368.

anoadragon453 · 2022-05-13T16:46:12Z

I had wondered this too. I don't know what the idea is there; perhaps @H-Shay or @anoadragon453 have a plan? See also #12368

We're in a bit of an awkward in-between state now. The referenced PR mentions then pulling out the documentation from the config file, so that we're left with a minimal configuration file, where people can add more options at their leisure.

#8159 is the original issue for this, and we should take further discussion of it there.

H-Shay · 2022-05-13T17:38:32Z

Is there any plan/effort to make it generated, or does it serve some broader purpose? Is it expected/acceptable to just copypaste the description from config sample?

The broader purpose is for the manual to replace the comments in the config, so that the config becomes a more manageable file and the config documentation is indexed, searchable, and easier to read and link to. Right now it's totally fine to copy/paste your description from the config sample into the config manual as long as you follow the formatting in the manual. If all goes to plan in the future there will only be the config manual to update, no comments in the config so hopefully that will be a little clearer!

richvdh · 2022-05-17T10:18:55Z

docs/sample_config.yaml

+# Set to establish maximum backoff (in seconds) between HS -> AS connection attempts.
+# Upon failing to push appservice events the homeserver will wait
+# an increasing amount of seconds between retries. This sets an upper limit of that (in seconds)
+#
+#appservice_max_backoff: 60


this doesn't seem to say what the default behaviour is?

note that pretty much everything in the config file which is a duration is in milliseconds, so having a setting in seconds is confusing. Consider making it take a suffix like "s" for seconds, for consistency with all the other time period settings.

richvdh · 2022-05-17T10:33:22Z

please could this PR be updated with a bit more information about why this is a useful thing to have? Generally we should strive to have solutions that work for everyone, rather than requiring users to configure everything, so I'd like to understand why that isn't possible.

question: should this go in the per-appservice config file rather than the main one? (a) the main config file already has quite enough options, thank you; (b) depending on why this is needed, it might be better to have different settings for different ASes?

tadzik · 2022-05-20T08:16:15Z

Is there any plan/effort to make it generated, or does it serve some broader purpose? Is it expected/acceptable to just copypaste the description from config sample?

The broader purpose is for the manual to replace the comments in the config, so that the config becomes a more manageable file and the config documentation is indexed, searchable, and easier to read and link to. Right now it's totally fine to copy/paste your description from the config sample into the config manual as long as you follow the formatting in the manual. If all goes to plan in the future there will only be the config manual to update, no comments in the config so hopefully that will be a little clearer!

Alright – done now in 08f7d32

erikjohnston

Otherwise I think this makes sense

erikjohnston · 2022-05-23T13:14:11Z

docs/sample_config.yaml

+#
+# Regardless of this setting, the delay will never be longer than 512 seconds (about 8.5 minutes).
+#
+#appservice_max_backoff_s: 60


Can this take a duration please? I.e. appservice_max_backoff: 60s instead. When reading the config you can use self.parse_duration(...).

richvdh · 2022-05-23T11:33:25Z

docs/usage/configuration/config_documentation.md

+Config option: `appservice_max_backoff_s`
+
+Set to establish a maximum backoff (in seconds) between HS -> AS connection attempts.
+Upon failing to push appservice events, the homeserver will reattempt connection to the
+application service after a delay. The delay increases with subsequent retries.
+This value sets an upper limit on that delay.
+
+Regardless of this setting, the delay will never be longer than 512 seconds (about 8.5 minutes),
+which is the default behaviour if this option is not set.
+
+Example configuration:
+```yaml
+appservice_max_backoff_s: 1


I think you misunderstood my earlier comment. The name of the setting should not have an s suffix, rather the value of the setting should be able to take a suffix (such as s for seconds, m for minutes), and if no suffix is present, it should be in milliseconds. That will make it consistent with other durations in the config file.

see for example redaction_retention_period.

richvdh · 2022-05-23T11:34:05Z

docs/usage/configuration/config_documentation.md

@@ -2292,6 +2292,21 @@ Example configuration:
 track_appservice_user_ips: true
 ```
 ---
+Config option: `appservice_max_backoff_s`


not sure I saw an answer on this before, so I'll ask again:

should this go in the per-appservice config file rather than the main one? (a) the main config file already has quite enough options; (b) depending on why this is needed, it might be better to have different settings for different ASes?

DMRobertson · 2022-07-12T15:16:11Z

What's the status of this? Is there still merit in having this option?

AFAICS the outstanding points are:

Allow configuring maximum interval between AS transaction retries #12685 (comment) Should this be Synapse-wide or per-appservice?
Allow configuring maximum interval between AS transaction retries #12685 (comment) needs to parse a duration from the config file
resolve merge conflicts after the recent config organisation changes

(Is there a need for an option at all? Can we do something else which e.g. resets the backoff timer? E.g. a heartbeat/ping-pong mechanism between HS and AS for each to be convinced that the other is alive?)

tadzik · 2022-07-22T08:34:53Z

What's the status of this? Is there still merit in having this option?

I think the merit is still there, though in ideal world I'd expect this to Just Work Reliably, without needing to configure more aggressive polling.

Can we do something else which e.g. resets the backoff timer?

Perhaps that's what we're missing. Most bridges I'm aware of do some variant of pinging Synapse on startup (usually checking if its user is registered), which should be a clear enough signal for Synapse that the AS is alive and worth talking to.

I'm happy to close this in favour of some future AS-request-resets-backoff-timer PR.

tadzik · 2022-10-20T14:06:40Z

Closing as per my last comment.

Allow configuring maximum interval between AS transaction retries

d2d1dd9

Signed-off-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>

tadzik requested a review from a team as a code owner May 10, 2022 10:54

tadzik added 2 commits May 10, 2022 13:06

Update sample config

9c578f0

Signed-off-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>

Add a changelog entry for #12685

a7981b6

Signed-off-by: Tadeusz Sośnierz <tadeusz@sosnierz.com>

clokep removed the request for review from a team May 10, 2022 11:47

Fix appservice.scheduler tests

88f97de

clokep requested a review from a team May 10, 2022 14:01

DMRobertson self-assigned this May 11, 2022

DMRobertson suggested changes May 11, 2022

View reviewed changes

tadzik and others added 4 commits May 13, 2022 11:54

Use Optional instead of a Union

04f1c9e

Co-authored-by: David Robertson <david.m.robertson1@gmail.com>

Improve wording for appservice_max_backoff

49cc2b7

Co-authored-by: David Robertson <david.m.robertson1@gmail.com>

Fix linting in appservice.scheduler

cf22683

Document the default behaviour of appservice_max_backoff

727bbf8

DMRobertson removed their assignment May 13, 2022

richvdh suggested changes May 17, 2022

View reviewed changes

tadzik added 3 commits May 20, 2022 09:58

Update sample config with appservice_max_backoff wording changes

bf6b698

Rename (appservice_)max_backoff to max_backoff_s

548778a

Document appservice_max_backoff_s in config documentation

08f7d32

tadzik requested review from richvdh and DMRobertson May 20, 2022 08:17

richvdh requested review from a team and removed request for richvdh May 23, 2022 11:01

erikjohnston reviewed May 23, 2022

View reviewed changes

richvdh reviewed May 23, 2022

View reviewed changes

DMRobertson added the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label Jun 6, 2022

DMRobertson removed their request for review July 12, 2022 15:10

DMRobertson mentioned this pull request Oct 20, 2022

We need a way to wake up an AppService push task manually #14240

Open

tadzik closed this Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configuring maximum interval between AS transaction retries #12685

Allow configuring maximum interval between AS transaction retries #12685

tadzik commented May 10, 2022 •

edited

Loading

clokep commented May 10, 2022

DMRobertson left a comment

DMRobertson May 11, 2022

tadzik May 13, 2022

tadzik commented May 13, 2022

DMRobertson commented May 13, 2022

anoadragon453 commented May 13, 2022

H-Shay commented May 13, 2022 •

edited

Loading

richvdh May 17, 2022

richvdh May 17, 2022

richvdh commented May 17, 2022

tadzik commented May 20, 2022

erikjohnston left a comment

erikjohnston May 23, 2022

richvdh May 23, 2022

richvdh May 23, 2022

DMRobertson commented Jul 12, 2022

tadzik commented Jul 22, 2022

tadzik commented Oct 20, 2022

Allow configuring maximum interval between AS transaction retries #12685

Allow configuring maximum interval between AS transaction retries #12685

Conversation

tadzik commented May 10, 2022 • edited Loading

Pull Request Checklist

clokep commented May 10, 2022

DMRobertson left a comment

Choose a reason for hiding this comment

DMRobertson May 11, 2022

Choose a reason for hiding this comment

tadzik May 13, 2022

Choose a reason for hiding this comment

tadzik commented May 13, 2022

DMRobertson commented May 13, 2022

anoadragon453 commented May 13, 2022

H-Shay commented May 13, 2022 • edited Loading

richvdh May 17, 2022

Choose a reason for hiding this comment

richvdh May 17, 2022

Choose a reason for hiding this comment

richvdh commented May 17, 2022

tadzik commented May 20, 2022

erikjohnston left a comment

Choose a reason for hiding this comment

erikjohnston May 23, 2022

Choose a reason for hiding this comment

richvdh May 23, 2022

Choose a reason for hiding this comment

richvdh May 23, 2022

Choose a reason for hiding this comment

DMRobertson commented Jul 12, 2022

tadzik commented Jul 22, 2022

tadzik commented Oct 20, 2022

tadzik commented May 10, 2022 •

edited

Loading

H-Shay commented May 13, 2022 •

edited

Loading