Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve type hints for data store usage #11165

Open
matrixbot opened this issue Dec 19, 2023 · 0 comments
Open

Improve type hints for data store usage #11165

matrixbot opened this issue Dec 19, 2023 · 0 comments

Comments

@matrixbot
Copy link
Collaborator

matrixbot commented Dec 19, 2023

This issue has been migrated from #11165.


Everywhere in Synapse, HomeServer.get_datastore() is annotated as the DataStore god class, which is incorrect for workers and allowed #11163 to slip into matrix.org, causing an outage. It would be a good idea for data store consumers (servlets, handlers, the module API, etc) to declare which aspects of the data store they need, and have CI check that we don't pass them a data store that's missing the required interfaces.

There are three data store classes to consider:

  • DataStore, which is the data store class used on the main process
  • GenericWorkerSlavedStore, which is the data store class used on worker processes
  • AdminCmdSlavedStore, which is the data store class used when running admin commands(?)

DataStore and GenericWorkerSlavedStore overlap but aren't subtypes of each other. AdminCmdSlavedStore is a subset of GenericWorkerSlavedStore functionality-wise, but not through inheritance.

It's been suggested by @reivilibre that we define two or three types for use in type hints:

  • WorkerStore
  • MainStore (a subtype of WorkerStore?)
  • EitherStore = Union[WorkerStore, MainStore], if it turns out that WorkerStore contains functionality not in MainStore

These don't have to be concrete classes and could be Protocols if needed.

We could have more granular store requirements, but it wouldn't catch any extra bugs.

The code is currently structured like this in a lot of places:

class GroupsLocalWorkerHandler:
    def __init__(self, hs: "HomeServer"):
        self.store = hs.get_datastore()

Proposal 1: HomeServer[WorkerStore]

We could add a covariant type parameter to HomeServer to have data store consumers declare which type of data store they need:

class GroupsLocalWorkerHandler:
    def __init__(self, hs: "HomeServer[WorkerStore]"):
        self.store = hs.get_datastore()

HomeServer[WorkerStore] would mean "a homeserver with a data store that supports at least the capabilities of a WorkerStore".

We'd have to do this refactor across the entire codebase at once, otherwise the type hints for data stores would be implicitly degraded to Any everywhere.

Proposal 2: get_worker_datastore()

@clokep suggests that we add new get_*_datastore() methods to HomeServer that raise errors when called on the wrong process:

class GroupsLocalWorkerHandler:
    def __init__(self, hs: "HomeServer"):
        self.store = hs.get_worker_datastore()

mypy would not flag up issues, but any CI stage which spins up workers would fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant