Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Move Complement test matrix jobs definition to match Sytest and Trial. #14153

Conversation

realtyem
Copy link
Contributor

@realtyem realtyem commented Oct 12, 2022

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

In preparation for Faster joins, and further worker mode development, let's move configuration of the test options to the same place as Sytest and Trial and avoid configuration fragmentation.

Currently, the list of workers used by Complement for testing is defined in start_for_complement.sh which is the entrypoint for that docker image. I propose to locate this list in calculate_jobs.py instead. At this time, this is a drop-in replacement for existing code, no workers are removed or added.

Created a backwards compatibility mode for running complement.sh on the command-line so existing functionality is preserved. Using WORKERS=1 and not setting WORKER_TYPES to anything still uses the values originally defined in start_for_complement.sh
Signed-off-by: Jason Little realtyem@gmail.com

…ate_jobs.py

Keep the set of workers defined as-is. Use string concatenation to form the string to not be messy.
Perhaps alphabetize later? Make sure to accommodate monolith mode and database types.
While there, make use of env to reduce an obnoxiously long commandline and do a little housecleaning.
SYNAPSE_WORKER_TYPES is now passed through, so don't need that. Don't need to set it as an empty
string if not running with workers, as that's done earlier.

Add a little extra logging at the top to make sure it actually makes it all way through.
@@ -140,7 +140,7 @@ if [[ -n "$WORKERS" ]]; then
export PASS_SYNAPSE_COMPLEMENT_USE_WORKERS=true

# Pass through the workers defined. If none, it will be an empty string
export PASS_SYNAPSE_WORKER_TYPES="$SYNAPSE_WORKER_TYPES"
export PASS_SYNAPSE_WORKER_TYPES="$WORKER_TYPES"
Copy link
Contributor Author

@realtyem realtyem Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it matches with the variables being declared on the commandline and environment at the same time. Standardization is good, redundancy is bad.

@realtyem
Copy link
Contributor Author

realtyem commented Oct 16, 2022

As part of this, I was looking at the part of Synapse that parses the configuration for workers and does some sanity checking before setting up the worker for operation. It looks like the things that are capable of being sharded,

  • Federation Senders
  • Pushers

need to be added to their special maps rather there are more than one or not. This could be a left-over relic of not being migrated to generic workers app.

  • pushers need pusher_instances
  • federation senders need federation_sender_instances
    IIRC, there are more than one way to declare if master should handle these tasks or not. If I'm following this correctly(using federation senders as an example; pushers is virtually identical logic)
* set global send_federation to true if it doesn't exist
* if federation_sender_instances has content
  *  use that
* else federation_sender_instances is empty
  * if global send_federation is true then use master
  * if using the deprecated worker_app 'synapse.app.federation_sender'
    * if send_federation is true error out
    * set worker_name into federation_sender_instances
* set worker's send_federation to true if this worker's name is in federation_sender_instances

So basically, don't need to set send_federation(or start_pushers) at all any more if you are setting the worker instance into it's special map. I guess should mark those as deprecated but backward compatible?
Event persisters are special in that they don't have a special map. As long as they are part of stream_writers and instance_map they need no other configuration. It also appears that you can only have a single stream writer of each kind no matter how large your homeserver is. How does matrix.org handle this? Surely that's overwhelming?

I'll be testing this out shortly.

-Edit: Moving this thought/question into a separate PR

realtyem added a commit to realtyem/synapse-old that referenced this pull request Oct 17, 2022
realtyem added a commit to realtyem/synapse-old that referenced this pull request Oct 17, 2022
@realtyem realtyem mentioned this pull request Oct 17, 2022
4 tasks
@realtyem realtyem marked this pull request as ready for review October 17, 2022 06:44
@realtyem realtyem requested a review from a team as a code owner October 17, 2022 06:44
@realtyem
Copy link
Contributor Author

Please sanity check me

Copy link
Contributor

@DMRobertson DMRobertson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this one, and I don't fully understand the motivation here.

Firstly: we moved the sytest and trial job definitions to a python script because we wanted to run a limited subset of them on PRs, but the full set on develop and release branches. We haven't made a similar change yet for complement (and this change doesn't seem to do so).

Secondly: I expect changing the complement configuration to spawn more individual workers is going to make it slower for complement tests to run. It's already the longest CI job as it is---I'm not sure if we can afford this. (At least, not on every PR.)

Thirdly: this PR is tricky to review. It changes two things at once: moving the matrix definition; then adding new jobs to that matrix. It's sometimes okay to change multiple things in one PR, but if so it's best to keep the commit history is a sequence of independent, self-contained changes---that's not the case here.

(One can clean up commit history after the fact using an interactive rebase---but we ask that contributers don't do this after requesting review, as it conflicts with Github's review tools.)

I've left some thoughts, but I think it would be best to proceed by breaking this up into smaller PRs.

@@ -323,7 +323,8 @@ COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestImportHistoric
The above will run a monolithic (single-process) Synapse with SQLite as the database. For other configurations, try:

- Passing `POSTGRES=1` as an environment variable to use the Postgres database instead.
- Passing `WORKERS=1` as an environment variable to use a workerised setup instead. This option implies the use of Postgres.
- Passing `WORKERS=1` as an environment variable to use a set of workers that mirrors what is used in Sytest. This option implies the use of Postgres.
- If setting `WORKERS=1`, optionally set `WORKER_TYPES=` to declare which worker types you wish to test. A simple comma-delimited string containing the worker types defined from the template in [here](https://github.com/matrix-org/synapse/blob/develop/docker/configure_workers_and_start.py). A safe example would be `WORKER_TYPES="federation_inbound,federation_sender,synchrotron,"`. See the [worker documentation](https://matrix-org.github.io/synapse/latest/workers.html) for additional information on workers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't want to have a trailing comma after synchrotron?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, that's right it needs to not be there.

set -o pipefail
COMPLEMENT_DIR=`pwd`/complement synapse/scripts-dev/complement.sh -json 2>&1 | synapse/.ci/scripts/gotestfmt

back-compat:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't this be included under the complement job above as part of the matrix?

It's also not clear what this is testing backwards compatibility with?

@realtyem
Copy link
Contributor Author

This was one of my first trys at a PR and I've learned a lot since then. For some reason, the commits history is including a completely different PR in the history, which didn't get documented here. This WAS two separate PR's. #14202

Firstly: we moved the sytest and trial job definitions to a python script because we wanted to run a limited subset of them on PRs, but the full set on develop and release branches. We haven't made a similar change yet for complement (and this change doesn't seem to do so).

That is completely fair and is something I didn't know. Now I know how to look up old PR's correctly(for instance I found the one you speak of in #13713 ).

Secondly: I expect changing the complement configuration to spawn more individual workers is going to make it slower for complement tests to run. It's already the longest CI job as it is---I'm not sure if we can afford this. (At least, not on every PR.)

This PR wasn't supposed to spawn more workers than is currently used. I didn't know it would blend the commit history with another PR just by adding a Needs line in the description(which is the only way I can think of that this happened). The functionality here was only as a drop in replacement and no additional workers are spawned.

Thirdly: this PR is tricky to review. It changes two things at once: moving the matrix definition; then adding new jobs to that matrix. It's sometimes okay to change multiple things in one PR, but if so it's best to keep the commit history is a sequence of independent, self-contained changes---that's not the case here.

This is absolutely true and in hind-sight a good idea. I'm thinking I'll break the backwards-compatibility mode into a PR first, so individual workers can be tested as an override. Then the second PR for moving the matrix definition to the calculate_jobs.py. The extra job that is run now is to be reverted(Commit 7528511 is purely a test to show that the backwards compatilibity mode works as intended)

(One can clean up commit history after the fact using an interactive rebase---but we ask that contributers don't do this after requesting review, as it conflicts with Github's review tools.)

I probably won't ever use that, as I didn't know it existed until now 😁

I've left some thoughts, but I think it would be best to proceed by breaking this up into smaller PRs.

Thanks for the time, I completely agree. This got way messier than it was supposed to.

@realtyem realtyem closed this Oct 28, 2022
@realtyem realtyem deleted the realtyem/complement-move-workers-def-to-one-place branch November 27, 2022 08:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants