Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Modernize configure_workers_and_start.py bootstrapping script for Dockerfile-workers. #14294

Merged

Conversation

realtyem
Copy link
Contributor

@realtyem realtyem commented Oct 26, 2022

Trying to help solve

It appears that since Feb 25th, 2020, all workers are generic workers, full stop. Some stragglers might have caused problems at that time, but everything seems to work as it's supposed to now. Let's modernize configure_workers_and_start.py to reflect this. Documentation should be a separate pull request. This file is used frequently by curious people to setup their own servers. Although there are further issues to explore, this PR will only focus on configure_workers_and_start.py.

Checklist of existing known 'non'-generic_workers to switch over here:

  • synapse.app.federation_sender
  • synapse.app.media_repository (See Internal Issue 3 below)
  • synapse.app.pusher
  • synapse.app.frontend_proxy

Details:

  1. Lines 432 - 444 in generic_worker.py list all the historical apps by name. This is reflected in the actual directory synapse/app. All of those worker python files have the same content loading generic_worker.py instead. So, all workers are generic workers and can be labeled such in worker configuration yaml files. This has been true for synapse.app.federation_sender and synapse.app.pusher since 1.29.0 on Clean up ShardedWorkerHandlingConfig #9466

  2. There is a comment that references using '*' to declare all workers. That doesn't seem to have worked since Apr 14, 2021 when the logic was removed. I like the idea of this functionality, but I think it should be a string instead of an asterisk. Perhaps "full" or "all". Until this is decided, delete the comment as it can be confusing when it doesn't work.

  3. For some reason, the setting of media_instance_running_background_jobs was orphaned at the bottom of a subroutine. There is logic for handling this and including it in the main WORKER_CONFIG template near the top of the file. Made this consistent with other settings like this, for instance background_worker's run_background_tasks_on.

Internal Issues:

  1. For Details 1: Using synapse.app.generic_worker instead of synapse.app.federation_sender means the send_federation configuration option is ignored, as written in workers.py. The way forward appears to be to use the special "Map" of federation_sender_instances which is placed into shared or homeserver config(if using the main configuration as the shared config). This appears to be necessary if there are any federation_sender's. See comment where I discover this and go into it with psuedo-logic to try and rationalize. Since then, I've read into it more, leading here. I'm not sure that was the intent, but it seems to be the result. It works and it appears cleaner to my eyes. See comment in source where it's said this is the way that is preferred.

  2. For Details 1: Same as Issue 1 for synapse.app.pusher, start_pushers and pusher_instances, respectively. See line 271 of workers.py. The logic appears to be largely the same as for federation senders.

  3. For Details 1: synapse.app.media_repository is a synapse.app.generic_worker, however because of this line, enable_media_repo won't work with synapse.app.generic_worker. As such, this won't be changed now. I feel like this logic should be in the workers.py file with similar type options(like run_background_tasks_on, notify_appservices_from_worker and update_user_directory_from_worker), and possibly adapted to use _should_this_worker_perform_duty(). Ideally, this should all be on a map instance that hashes by worker_name in the shared config, similar to pusher_instances and federation_sender_instances.

External Issues(and Questions) to explore further:

  1. There are a lot of duplicate values in worker config files that can be accessed by the server config directly, like the worker_main_http_uri, worker_replication_host and worker_replication_http_port. Can these be referenced directly, instead of being in an actual file?
  2. Tests which contain the names of deprecated synapse.app.*
  1. At this point(or rather since 2020), it should be possible to remove needing to declare the app type at all in any config since they are all the same(exception being synapse.app.media_repository, see above Internal Issue 3), thereby removing the need for worker 'types'. Need someone who knows more to double check this is accurate. Perhaps a conditional that if app isn't declared it will automatically default to synapse.app.generic_worker would be sufficient? I also feel like there are more things like this that can be done.

Notes:

  1. Documentation details to be updated:
  • account_data endpoints listed twice, once in stream writers and once for generic_workers. Which is correct since it is a stream writer? Needs clarification so this isn't accidently duplicated in reverse proxy configuration.
  • receipts, same as above.
  • presence, same as above.

Pull Request Checklist

Signed-off-by: Jason Little realtyem@gmail.com

@realtyem realtyem marked this pull request as ready for review October 26, 2022 01:04
@realtyem realtyem requested a review from a team as a code owner October 26, 2022 01:04
@dklimpel
Copy link
Contributor

  1. There are a lot of duplicate values in worker config files that can be accessed by the server config directly, like the worker_main_http_uri, worker_replication_host and worker_replication_http_port. Can these be referenced directly, instead of being in an actual file?

I've had that thought too. Why do not move this to the shared homeserver.yaml?

@realtyem
Copy link
Contributor Author

  1. There are a lot of duplicate values in worker config files that can be accessed by the server config directly, like the worker_main_http_uri, worker_replication_host and worker_replication_http_port. Can these be referenced directly, instead of being in an actual file?

I've had that thought too. Why do not move this to the shared homeserver.yaml?

I've got something better than that coming through the thought pipeline. How about getting rid of it altogether? Obviously, in a different PR than this.

@reivilibre
Copy link
Contributor

  1. There are a lot of duplicate values in worker config files that can be accessed by the server config directly, like the worker_main_http_uri, worker_replication_host and worker_replication_http_port. Can these be referenced directly, instead of being in an actual file?

I've had that thought too. Why do not move this to the shared homeserver.yaml?

This would be completely reasonable and surprised me too. Possibly it has come about because matrix.org is across multiple machines and so the address you use for contacting the master can depend based on which machine the workers are on (?). Who knows. It should be simplified for the docs though.

Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable; ping me again to check when the underlying PR has merged first

@reivilibre reivilibre self-requested a review November 8, 2022 12:36
@reivilibre
Copy link
Contributor

This now has conflicts :(

@realtyem
Copy link
Contributor Author

realtyem commented Nov 9, 2022

Should be fixed now.

@realtyem
Copy link
Contributor Author

realtyem commented Nov 9, 2022

Incorrectly requesting a worker type will now error and exit. Change requested by @reivilibre out-of-band.

@reivilibre reivilibre enabled auto-merge (squash) November 9, 2022 11:35
@reivilibre
Copy link
Contributor

Thanks!! :)

@reivilibre reivilibre merged commit e9cbddc into matrix-org:develop Nov 9, 2022
@realtyem realtyem deleted the modernize-config-script-for-workers branch November 9, 2022 12:18
bradtgmurray added a commit to beeper/synapse-legacy-fork that referenced this pull request Nov 22, 2022
Synapse 1.72.0 (2022-11-22)
===========================

Please note that Synapse now only supports PostgreSQL 11+, because PostgreSQL 10 has reached end-of-life, c.f. our [Deprecation Policy](https://github.com/matrix-org/synapse/blob/develop/docs/deprecation_policy.md).

Bugfixes
--------

- Update forgotten references to legacy metrics in the included Grafana dashboard. ([\matrix-org#14477](matrix-org#14477))

Synapse 1.72.0rc1 (2022-11-16)
==============================

Features
--------

- Add experimental support for [MSC3912](matrix-org/matrix-spec-proposals#3912): Relation-based redactions. ([\matrix-org#14260](matrix-org#14260))
- Build Debian packages for Ubuntu 22.10 (Kinetic Kudu). ([\matrix-org#14396](matrix-org#14396))
- Add an [Admin API](https://matrix-org.github.io/synapse/latest/usage/administration/admin_api/index.html) endpoint for user lookup based on third-party ID (3PID). Contributed by @ashfame. ([\matrix-org#14405](matrix-org#14405))
- Faster joins: include heroes' membership events in the partial join response, for rooms without a name or canonical alias. ([\matrix-org#14442](matrix-org#14442))

Bugfixes
--------

- Faster joins: do not block creation of or queries for room aliases during the resync. ([\matrix-org#14292](matrix-org#14292))
- Fix a bug introduced in Synapse 1.64.0rc1 which could cause log spam when fetching events from other homeservers. ([\matrix-org#14347](matrix-org#14347))
- Fix a bug introduced in 1.66 which would not send certain pushrules to clients. Contributed by Nico. ([\matrix-org#14356](matrix-org#14356))
- Fix a bug introduced in v1.71.0rc1 where the power level event was incorrectly created during initial room creation. ([\matrix-org#14361](matrix-org#14361))
- Fix the refresh token endpoint to be under /r0 and /v3 instead of /v1. Contributed by Tulir @ Beeper. ([\matrix-org#14364](matrix-org#14364))
- Fix a long-standing bug where Synapse would raise an error when encountering an unrecognised field in a `/sync` filter, instead of ignoring it for forward compatibility. ([\matrix-org#14369](matrix-org#14369))
- Fix a background database update, introduced in Synapse 1.64.0, which could cause poor database performance. ([\matrix-org#14374](matrix-org#14374))
- Fix PostgreSQL sometimes using table scans for queries against the `event_search` table, taking a long time and a large amount of IO. ([\matrix-org#14409](matrix-org#14409))
- Fix rendering of some HTML templates (including emails). Introduced in v1.71.0. ([\matrix-org#14448](matrix-org#14448))
- Fix a bug introduced in Synapse 1.70.0 where the background updates to add non-thread unique indexes on receipts could fail when upgrading from 1.67.0 or earlier. ([\matrix-org#14453](matrix-org#14453))

Updates to the Docker image
---------------------------

- Add all Stream Writer worker types to `configure_workers_and_start.py`. ([\matrix-org#14197](matrix-org#14197))
- Remove references to legacy worker types in the multi-worker Dockerfile. ([\matrix-org#14294](matrix-org#14294))

Improved Documentation
----------------------

- Upload documentation PRs to Netlify. ([\matrix-org#12947](matrix-org#12947), [\matrix-org#14370](matrix-org#14370))
- Add addtional TURN server configuration example based on [eturnal](https://github.com/processone/eturnal) and adjust general TURN server doc structure. ([\matrix-org#14293](matrix-org#14293))
- Add example on how to load balance /sync requests. Contributed by [aceArt](https://aceart.de). ([\matrix-org#14297](matrix-org#14297))
- Edit sample Nginx reverse proxy configuration to use HTTP/1.1. Contributed by Brad Jones. ([\matrix-org#14414](matrix-org#14414))

Deprecations and Removals
-------------------------

- Remove support for PostgreSQL 10. ([\matrix-org#14392](matrix-org#14392), [\matrix-org#14397](matrix-org#14397))

Internal Changes
----------------

- Run unit tests against Python 3.11. ([\matrix-org#13812](matrix-org#13812))
- Add TLS support for generic worker endpoints. ([\matrix-org#14128](matrix-org#14128), [\matrix-org#14455](matrix-org#14455))
- Switch to a maintained action for installing Rust in CI. ([\matrix-org#14313](matrix-org#14313))
- Add override ability to `complement.sh` command line script to request certain types of workers. ([\matrix-org#14324](matrix-org#14324))
- Enabling testing of [MSC3874](matrix-org/matrix-spec-proposals#3874) (filtering of `/messages` by relation type) in complement. ([\matrix-org#14339](matrix-org#14339))
- Concisely log a failure to resolve state due to missing `prev_events`. ([\matrix-org#14346](matrix-org#14346))
- Use a maintained Github action to install Rust. ([\matrix-org#14351](matrix-org#14351))
- Cleanup old worker datastore classes. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#14375](matrix-org#14375))
- Test against PostgreSQL 15 in CI. ([\matrix-org#14394](matrix-org#14394))
- Remove unreachable code. ([\matrix-org#14410](matrix-org#14410))
- Clean-up event persistence code. ([\matrix-org#14411](matrix-org#14411))
- Update docstring to clarify that `get_partial_state_events_batch` does not just give you completely arbitrary partial-state events. ([\matrix-org#14417](matrix-org#14417))
- Fix mypy errors introduced by bumping the locked version of `attrs` and `gitpython`. ([\matrix-org#14433](matrix-org#14433))
- Make Dependabot only bump Rust deps in the lock file. ([\matrix-org#14434](matrix-org#14434))
- Fix an incorrect stub return type for `PushRuleEvaluator.run`. ([\matrix-org#14451](matrix-org#14451))
- Improve performance of `/context` in large rooms. ([\matrix-org#14461](matrix-org#14461))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants