Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Split presence out of master #9820

Merged
merged 9 commits into from
Apr 23, 2021
Merged

Split presence out of master #9820

merged 9 commits into from
Apr 23, 2021

Conversation

erikjohnston
Copy link
Member

Based on #9819.

Reviewable commit be commit.

@erikjohnston
Copy link
Member Author

Annoyingly the CI is failing due to #9864, which is an unrelated bug. Note that the CI is running against the sytest PR that moves presence to the client reader worker.

@erikjohnston
Copy link
Member Author

Annoyingly the CI is failing due to #9864, which is an unrelated bug. Note that the CI is running against the sytest PR that moves presence to the client reader worker.

This is turning into a hard thing to fix, and I don't really want to block landing this on that. I'm therefore tempted to create a new worker on sytest for the various writers (like presence, typing, devices, etc) to bypass the bug. Thoughts?

@richvdh
Copy link
Member

richvdh commented Apr 22, 2021

Annoyingly the CI is failing due to #9864, which is an unrelated bug. Note that the CI is running against the sytest PR that moves presence to the client reader worker.

This is turning into a hard thing to fix, and I don't really want to block landing this on that. I'm therefore tempted to create a new worker on sytest for the various writers (like presence, typing, devices, etc) to bypass the bug. Thoughts?

I have failed thus far to grok exactly what the problem is, and why it is only a problem now. Is it to do with which processes end up on which workers?

@erikjohnston
Copy link
Member Author

Annoyingly the CI is failing due to #9864, which is an unrelated bug. Note that the CI is running against the sytest PR that moves presence to the client reader worker.

This is turning into a hard thing to fix, and I don't really want to block landing this on that. I'm therefore tempted to create a new worker on sytest for the various writers (like presence, typing, devices, etc) to bypass the bug. Thoughts?

I have failed thus far to grok exactly what the problem is, and why it is only a problem now. Is it to do with which processes end up on which workers?

The problem is that with presence on the client reader in tests, the client reader does state resolution and caches the results locally. When we purge history and backfill on the client reader we read from that cache, which returns state groups that have been deleted. This is a bug that exists on develop already if something else causes state res to happen, its just in the tests nothing was before.

@richvdh
Copy link
Member

richvdh commented Apr 22, 2021

Presumably this also affects message retention, in that case?

Brushing the problem under the carpet by shuffling things between workers feels pretty icky to me. But I guess if it's looking like a big job to fix, there isn't much choice.

@erikjohnston
Copy link
Member Author

erikjohnston commented Apr 22, 2021

Yup, message retention is also affected. It's worth noting that the cache expires after ten minutes, at which point backfilling will work again.

It is icky to fudge around it, but yeah, its now at the size where I think it should go on the planning board rather than as ad hoc work for this (plus this PR doesn't make it any worse)

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor nits, generally seems fine

Comment on lines 67 to 70
to_device: The instance that writes to the to_device stream.
account_data: The instance that writes to the account data streams.
receipts: The instance that writes to the receipts stream.
presence: The instance that writes to the presence stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like each of these can be a list of instances?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, apparently it's a single-element list, but you should probably say that.

(why are these different to typing ?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(why are these different to typing ?)

Err, good Q. I think there are two options here: 1) make the single item lists just be strings, or 2) make typing be a single item list. I'm tempted to go with the latter so everything is consistent, but the former may be a bit less confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making typing a single-item list sounds fine to me.

synapse/handlers/presence.py Show resolved Hide resolved
else:
# Query master process
# Query presence writer process
update_function = make_http_update_function(hs, self.NAME)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this need directing to the presence writer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update_function gets called with the instance to query (derived from the POSITION or RDATA commands that were just received, so will always be the presence writer)

Comment on lines 46 to 48
self._can_persist_presence = (
self._instance_name in hs.config.worker.writers.presence
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two copies of this. does it need to be conditional?

Comment on lines +16 to +20
CREATE SEQUENCE IF NOT EXISTS presence_stream_sequence;

SELECT setval('presence_stream_sequence', (
SELECT COALESCE(MAX(stream_id), 1) FROM presence_stream
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add this to the port script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah. I wonder if there is a way to stop us (read: me) from forgetting this.

from tests.unittest import HomeserverTestCase


class FrontendProxyTests(HomeserverTestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless we're actually killing off the frontend proxy (which we totally should, but I don't think we're there yet), perhaps this should stick around as an empty shell?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, just an empty class FrontendProxyTests? Can do, but I don't really see how it helps with anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're probably right. In my head, we would check that a frontend proxy is a thing that can be instantiated - but that's a thing we check in sytest anyway (iirc?). It would also be a place ready for us to add tests for the other things that frontend proxies do, but I'm not sure I rate the chances of that happening any time soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also worth noting that a frontend proxy is just a standard generic worker nowadays. But yes, we should ideally have tests for the fake KeyUploadServlet (or do we even need it know we can move encryption/device APIs off master?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even need it know we can move encryption/device APIs off master?

well, good question. maybe we should try not using it?

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

synapse/replication/http/_base.py Outdated Show resolved Hide resolved
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
@erikjohnston erikjohnston merged commit 9d25a0a into develop Apr 23, 2021
@erikjohnston erikjohnston deleted the erikj/move_presence branch April 23, 2021 11:21
anoadragon453 added a commit that referenced this pull request Apr 28, 2021
Synapse 1.33.0rc1 (2021-04-28)
==============================

Features
--------

- Update experimental support for [MSC3083](matrix-org/matrix-spec-proposals#3083): restricting room access via group membership. ([\#9800](#9800), [\#9814](#9814))
- Add experimental support for handling presence on a worker. ([\#9819](#9819), [\#9820](#9820), [\#9828](#9828), [\#9850](#9850))
- Return a new template when an user attempts to renew their account multiple times with the same token, stating that their account is set to expire. This replaces the invalid token template that would previously be shown in this case. This change concerns the optional account validity feature. ([\#9832](#9832))

Bugfixes
--------

- Fixes the OIDC SSO flow when using a `public_baseurl` value including a non-root URL path. ([\#9726](#9726))
- Fix thumbnail generation for some sites with non-standard content types. Contributed by @rkfg. ([\#9788](#9788))
- Add some sanity checks to identity server passed to 3PID bind/unbind endpoints. ([\#9802](#9802))
- Limit the size of HTTP responses read over federation. ([\#9833](#9833))
- Fix a bug which could cause Synapse to get stuck in a loop of resyncing device lists. ([\#9867](#9867))
- Fix a long-standing bug where errors from federation did not propagate to the client. ([\#9868](#9868))

Improved Documentation
----------------------

- Add a note to the docker docs mentioning that we mirror upstream's supported Docker platforms. ([\#9801](#9801))

Internal Changes
----------------

- Add a dockerfile for running Synapse in worker-mode under Complement. ([\#9162](#9162))
- Apply `pyupgrade` across the codebase. ([\#9786](#9786))
- Move some replication processing out of `generic_worker`. ([\#9796](#9796))
- Replace `HomeServer.get_config()` with inline references. ([\#9815](#9815))
- Rename some handlers and config modules to not duplicate the top-level module. ([\#9816](#9816))
- Fix a long-standing bug which caused `max_upload_size` to not be correctly enforced. ([\#9817](#9817))
- Reduce CPU usage of the user directory by reusing existing calculated room membership. ([\#9821](#9821))
- Small speed up for joining large remote rooms. ([\#9825](#9825))
- Introduce flake8-bugbear to the test suite and fix some of its lint violations. ([\#9838](#9838))
- Only store the raw data in the in-memory caches, rather than objects that include references to e.g. the data stores. ([\#9845](#9845))
- Limit length of accepted email addresses. ([\#9855](#9855))
- Remove redundant `synapse.types.Collection` type definition. ([\#9856](#9856))
- Handle recently added rate limits correctly when using `--no-rate-limit` with the demo scripts. ([\#9858](#9858))
- Disable invite rate-limiting by default when running the unit tests. ([\#9871](#9871))
- Pass a reactor into `SynapseSite` to make testing easier. ([\#9874](#9874))
- Make `DomainSpecificString` an `attrs` class. ([\#9875](#9875))
- Add type hints to `synapse.api.auth` and `synapse.api.auth_blocking` modules. ([\#9876](#9876))
- Remove redundant `_PushHTTPChannel` test class. ([\#9878](#9878))
- Remove backwards-compatibility code for Python versions < 3.6. ([\#9879](#9879))
- Small performance improvement around handling new local presence updates. ([\#9887](#9887))
@vitlav
Copy link

vitlav commented Jun 3, 2021

synapse/app/admin_cmd.py still using removed storage.presence:
from synapse.replication.slave.storage.presence import SlavedPresenceStore

babolivier added a commit to matrix-org/synapse-dinsic that referenced this pull request Sep 1, 2021
Synapse 1.33.0 (2021-05-05)
===========================

Features
--------

- Build Debian packages for Ubuntu 21.04 (Hirsute Hippo). ([\#9909](matrix-org/synapse#9909))

Synapse 1.33.0rc2 (2021-04-29)
==============================

Bugfixes
--------

- Fix tight loop when handling presence replication when using workers. Introduced in v1.33.0rc1. ([\#9900](matrix-org/synapse#9900))

Synapse 1.33.0rc1 (2021-04-28)
==============================

Features
--------

- Update experimental support for [MSC3083](matrix-org/matrix-spec-proposals#3083): restricting room access via group membership. ([\#9800](matrix-org/synapse#9800), [\#9814](matrix-org/synapse#9814))
- Add experimental support for handling presence on a worker. ([\#9819](matrix-org/synapse#9819), [\#9820](matrix-org/synapse#9820), [\#9828](matrix-org/synapse#9828), [\#9850](matrix-org/synapse#9850))
- Return a new template when an user attempts to renew their account multiple times with the same token, stating that their account is set to expire. This replaces the invalid token template that would previously be shown in this case. This change concerns the optional account validity feature. ([\#9832](matrix-org/synapse#9832))

Bugfixes
--------

- Fixes the OIDC SSO flow when using a `public_baseurl` value including a non-root URL path. ([\#9726](matrix-org/synapse#9726))
- Fix thumbnail generation for some sites with non-standard content types. Contributed by @rkfg. ([\#9788](matrix-org/synapse#9788))
- Add some sanity checks to identity server passed to 3PID bind/unbind endpoints. ([\#9802](matrix-org/synapse#9802))
- Limit the size of HTTP responses read over federation. ([\#9833](matrix-org/synapse#9833))
- Fix a bug which could cause Synapse to get stuck in a loop of resyncing device lists. ([\#9867](matrix-org/synapse#9867))
- Fix a long-standing bug where errors from federation did not propagate to the client. ([\#9868](matrix-org/synapse#9868))

Improved Documentation
----------------------

- Add a note to the docker docs mentioning that we mirror upstream's supported Docker platforms. ([\#9801](matrix-org/synapse#9801))

Internal Changes
----------------

- Add a dockerfile for running Synapse in worker-mode under Complement. ([\#9162](matrix-org/synapse#9162))
- Apply `pyupgrade` across the codebase. ([\#9786](matrix-org/synapse#9786))
- Move some replication processing out of `generic_worker`. ([\#9796](matrix-org/synapse#9796))
- Replace `HomeServer.get_config()` with inline references. ([\#9815](matrix-org/synapse#9815))
- Rename some handlers and config modules to not duplicate the top-level module. ([\#9816](matrix-org/synapse#9816))
- Fix a long-standing bug which caused `max_upload_size` to not be correctly enforced. ([\#9817](matrix-org/synapse#9817))
- Reduce CPU usage of the user directory by reusing existing calculated room membership. ([\#9821](matrix-org/synapse#9821))
- Small speed up for joining large remote rooms. ([\#9825](matrix-org/synapse#9825))
- Introduce flake8-bugbear to the test suite and fix some of its lint violations. ([\#9838](matrix-org/synapse#9838))
- Only store the raw data in the in-memory caches, rather than objects that include references to e.g. the data stores. ([\#9845](matrix-org/synapse#9845))
- Limit length of accepted email addresses. ([\#9855](matrix-org/synapse#9855))
- Remove redundant `synapse.types.Collection` type definition. ([\#9856](matrix-org/synapse#9856))
- Handle recently added rate limits correctly when using `--no-rate-limit` with the demo scripts. ([\#9858](matrix-org/synapse#9858))
- Disable invite rate-limiting by default when running the unit tests. ([\#9871](matrix-org/synapse#9871))
- Pass a reactor into `SynapseSite` to make testing easier. ([\#9874](matrix-org/synapse#9874))
- Make `DomainSpecificString` an `attrs` class. ([\#9875](matrix-org/synapse#9875))
- Add type hints to `synapse.api.auth` and `synapse.api.auth_blocking` modules. ([\#9876](matrix-org/synapse#9876))
- Remove redundant `_PushHTTPChannel` test class. ([\#9878](matrix-org/synapse#9878))
- Remove backwards-compatibility code for Python versions < 3.6. ([\#9879](matrix-org/synapse#9879))
- Small performance improvement around handling new local presence updates. ([\#9887](matrix-org/synapse#9887))
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants