-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synapse workers #456
Synapse workers #456
Conversation
Turns out, even with a [ok, further reading revealed the distinctive logic behind |
· needs documentation; no checks yet for port clashes or typos in worker name · according to https://github.com/matrix-org/synapse/wiki/Workers-setup-with-nginx#results about 90% of requests go to the synchrotron endpoint · thus, the synchrotron worker is especially suited to be load-balanced · most of the other workers are documented to support only a single instance · https://github.com/matrix-org/synapse/blob/master/docs/workers.md
· 😅 How to keep this in sync with the matrix-synapse documentation? · regex location matching is expensive · nginx syntax limit: one location only per block / statement · thus, lots of duplicate statements in this file
FIXME: horrid duplication in template file
@spantaleev this template file has a lot of duplicate statements.. I had started to factor some of that out into a |
One of the things that grabs my attention is the use of We try to keep roles as separate as possible and to instead pass data along via The easiest way to fix that would probably be by defining a new default like this in the # List of Synapse workers.
# Example: [{"worker": "federation_reader", "port": 5000}]
matrix_nginx_proxy_synapse_workers_list: [] .. and then connecting it to matrix_nginx_proxy_synapse_workers_list: "{{ matrix_synapse_workers_enabled_list }}" We probably don't need to pass I'm guessing when workers are enabled, we don't support I guess making workers work with this would be complicated. I see that you've modified I'm guessing that when workers are enabled, matrix-corporal likely breaks, as it's no longer capturing all traffic. Rather, some traffic gets captured in these new Using matrix-corporal + Synapse workers is some advanced use-case that will be difficult to solve. You should ignore it. I'm just mentioning it here for completeness. |
I don't have anything to add here, but just wanted to add this would be massively useful to my organization, and workers are a REALLY important feature. Would love to be able to use the officially supported ansible playbook for this rather than throwing something together. |
( ok thanks for the feedback.. this will need couple more days to get right.. also, we're hitting some issues, |
@eMPee584 any update? smooth sailing? |
Ah, sorry, I was busy with school the recent weeks and the convoluted nginx template also made me avoid finally tackling this. I'll try and get it done in the coming week.. |
Ah, good to here you're still on it! Looking forward to merging this! |
{% set federation_reader_worker = matrix_synapse_workers_enabled_list|selectattr('worker', 'equalto', 'federation_reader')|first %} | ||
{% if federation_reader_worker %} | ||
{# c.f. https://github.com/matrix-org/synapse/blame/master/docs/workers.md#L160 #} | ||
location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/_matrix/federation/v1/groups/ should only be allowed GET requests; this configuration will cause federated community invites to fail. reference here: https://github.com/matrix-org/synapse/blob/master/docs/workers.md#synapseappfederation_reader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.. Is this theoretical or have you experienced it on an actual system?
I think there probably is a different mistake in the config to cause this.. we are also having some minor troubles with our setup.
I've just created an awk
script to parse the endpoints directly from the upstream synapse workers documentation.. but it remains kinda nightmarish to work with.. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I experienced it myself when trying to set up a fresh homeserver so I just removed the groups endpoint so the main synapse thread is handling it instead for now.
{% set client_reader_worker = matrix_synapse_workers_enabled_list|selectattr('worker', 'equalto', 'client_reader')|first %} | ||
{% if client_reader_worker %} | ||
{# c.f. https://github.com/matrix-org/synapse/blame/master/docs/workers.md#L252 #} | ||
location ^/_matrix/client/(versions$|(api/v1|r0|unstable)/(publicRooms$|rooms/.*/joined_me|rooms/.*/context/.|rooms/.*/members$|rooms/.*/messages$|rooms/.*/state$|login$|account/3pid$|keys/query$|keys/changes$|voip/turnServer$|joined_groups$|publicised_groups$|publicised_groups/|pushrules/.*$|groups/.*$|register$|auth/.*/fallback/web$)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two of the endpoints here can only handle GET requests, and there are also two more not included here. The following list is GET-only for the client_reader worker:
^/_matrix/client/(api/v1|r0|unstable)/pushrules/.*$
^/_matrix/client/(api/v1|r0|unstable)/groups/.*$
^/_matrix/client/(api/v1|r0|unstable)/user/[^/]*/account_data/
^/_matrix/client/(api/v1|r0|unstable)/user/[^/]*/rooms/[^/]*/account_data/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "location ^/..." here, for a regex match needs to be "location ~ ^..." doesn't it?
roles/matrix-synapse/templates/synapse/systemd/matrix-synapse-worker@.service.j2
Outdated
Show resolved
Hide resolved
The workers in the current synapse version now communicate via Redis, so we should setup Redis as well to add worker support. |
`::` leads to errors like: > socket.gaierror: [Errno -9] Address family for hostname not supported
This leads to much easier management and potential safety features (validation). In the future, we could try to avoid port conflicts as well, but it didn't seem worth the effort to do it now. Our port ranges seem large enough. This can also pave the way for a "presets" feature (similar to `matrix_nginx_proxy_ssl_presets`) which makes it even easier for people to configure worker counts.
85a05f3 added support for dynamically generating I think having some presets might be a useful feature that can be added easily now. Most people would then be able to just switch a single other variable ( It'd be interesting to brainstorm what some good presets would be. I guess it could be like this:
Note that we currently don't support running multiple People that need more workers than these 3 presets can go for a hybrid approach: start with a preset (say "one of each") and then increase the number of a certain worker that they need more of (e.g. Another question is, which should be the default preset? I think Do you have other ideas for useful presets? Or comments about these ones? Does having a preset option sound useful at all or should we just go for "one of each" by default and ask people to tweak the worker counts? |
It would be great to get (voluntary) feedback on what actual worker counts end up being popular. One could start with one, or no, presets and add more as patterns are identified. |
This give us the possibility to run multiple instances of workers that that don't expose a port. Right now, we don't support that, but in the future we could run multiple `federation_sender` or `pusher` workers, without them fighting over naming (previously, they'd all be named something like `matrix-synapse-worker-pusher-0`, because they'd all define `port` as `0`).
I've made a breaking change to the format of People that define that manually will need to add an In any case, it's better to refrain from defining |
We're talking about a webserver running on the same machine, which imports the configuration files generated by the `matrix-nginx-proxy` in the `/matrix/nginx-proxy/conf.d` directory. Users who run an nginx webserver on some other machine will need to do something different.
Adding more presets in the future would be nice.
I think we have a cause for celebration! This is now complete and merged to Huge thanks to everyone who has worked on making this happen! There are minor things that could be improved. For example, Another thing that may be improved is metrics. Since #838, we do support metrics collection. When workers are enabled, graphs become somewhat sparse. I'm guessing we'd need to collect data from the various workers when workers are enabled. Maybe we need some different graphs (splitting them per worker or something?). |
Congratulations to those who worked on this! This saved my butt. We have 1k users only, but it seems this was one necessary step. I have two other questions:
They are not handled by this configuration of workers, I suppose. But, if I tune postgres to use "max_parallel_workers" as proposed here #532 and if I use more nginx worker connections by tampering with On the other hand: would it be difficult to let the nginx-config (at least) be managed by this workers-configuration or is this all about synapse only and therfore it should be a separate configuration? |
This probably got lost somehow in all the work that happened in #456
We can adjust both of these (nginx workers; Postgres max connections) automatically when Synapse workers are enabled via We can define variables for these things in their respective roles and then override them from |
Just to get an idea, how many resources/workers are needed at that scale? |
From what I remember, very few.. a couple (3-6) synchrotron workers and the resultant parallelization from using just a single worker for the others already had a sufficient impact compared to a monolithic synapse process. |
· first version: grokking this was a bit of 🤯 .. and there are probably still faults in the wiring⚠️
·
not tested impact on UX yet· update 2020/04/21: live testing on our 7k+ uni matrix instance 😅 💦
· update 2020/06/30: works very well to spread load, now with 10k+ users
· update 2020/09/09: currently working on expanding the patch queue to cover recent changes to synapse workers.. not much left but it might need another week.
· update 2020/09/29: about to merge with Max's contributions in #642 PR ✊♻️
· update 2020/10/14: ok I've merged and mostly reconciled Max's & my branch (this PR) & with MDAD* mainline, now working to get the frigging PIDs from the worker processes and fixing the removal of leftovers
· update 2020/10/28: nearly finished, only missing support for setups not using inbuilt nginx.. REVIEWS, PLEASE
· update 2021/01/08: yes, still nearly finished 😁