Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

our storage classes are hella confusing #11733

Open
2 of 5 tasks
richvdh opened this issue Jan 12, 2022 · 12 comments
Open
2 of 5 tasks

our storage classes are hella confusing #11733

richvdh opened this issue Jan 12, 2022 · 12 comments
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@richvdh
Copy link
Member

richvdh commented Jan 12, 2022

Store, StateGroupDataStore, DataStore, Storage, DatabasePool, Databases... 🤯

#8033 made some attempts to clean it up, but there's a lot more to do here.

There are basically three layers:

  1. a DatabasePool represents a connection to a physical database. Normally you have exactly one of these, but it's possible to set Synapse up to talk to two separate postgres instances (one of them gets used for state storage (state_groups_state, mainly), and the other for everything else), in which case you have two.

  2. a <foo>Store is a low-level thing containing SQL, and is linked to a single DatabasePool. We basically have two of these (state and main), again with the state/everything else split. state is always a StateGroupDataStore, but main can be any of (DataStore, AdminCmdSlavedStore, GenericWorkerSlavedStore) depending on the synapse app.

    PersistEventsStore seems to be a special case, and is a layer over the main store.

    Databases is a singleton which instantiates the DatabasePools and <foo>Stores, and holds references to them.

  3. a <foo>Storage is a higher-level thing which spans multiple <foo>Stores. Ideally it does not do any SQL itself, but delegates it all to the Stores. We have three of these (PurgeEventsStorage, StateGroupStorage, EventsPersistenceStorage). Note that a lot of non-storage code skips this layer and goes straight to the <foo>Stores.

    Storage itself instantiates and holds references to the <foo>Storages.

    Edit 2022/06/06: These are now known as <foo>StorageControllers, and Storage itself is now known as StorageControllers.


This is all very well, but I seem to be incapable of holding it in my head at once. This is really not helped by there being a lot of inconsistent naming. Examples:

  • Databases should surely be called DataStoreManager or similar. And HomeServer.get_datastores, which returns the Databases, should be renamed.
  • Likewise, Storage -> StorageManager. (superceded by Rename storage classes #12913)
  • the various implementations of the main Store could do with being named as such, for consistency with StateGroupDataStore - eg MasterMainDataStore, WorkerMainDataStore.
  • HomeServer.get_datastore (which returns the main Store) exists only for backwards compatibility. We should replace it with calls to get_datastores().main for consistency with the StateGroupDataStore. Remove HomeServer.get_datastore() #12031
  • All the million places that have a self.store should instead have a self._main_store.
@ShadowJonathan
Copy link
Contributor

Would the (new) naming be worth documenting? (Even though its considered developer use only?)

@H-Shay H-Shay added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Jan 13, 2022
@DMRobertson
Copy link
Contributor

Would the (new) naming be worth documenting? (Even though its considered developer use only?)

If anyone is going to spend effort on this, I'd rather it be spent on changing the source code

richvdh added a commit that referenced this issue Feb 18, 2022
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
richvdh added a commit that referenced this issue Feb 23, 2022
The presence of this method was confusing, and mostly present for backwards
compatibility. Let's get rid of it.

Part of #11733
@DMRobertson
Copy link
Contributor

  1. a DatabasePool represents a connection to a physical database. Normally you have exactly one of these, but it's possible to set Synapse up to talk to two separate postgres instances (one of them gets used for state storage (state_groups_state, mainly), and the other for everything else), in which case you have two.

Do we know of anyone making use of this?

@richvdh
Copy link
Member Author

richvdh commented Apr 27, 2022

Do we know of anyone making use of this?

No. At one point we thought we might use it on matrix.org, but we ended up just building a bigger postgres server, rather than splitting across two.

@richvdh
Copy link
Member Author

richvdh commented May 25, 2022

a <foo>Storage is a higher-level thing which spans multiple <foo>Stores. Ideally it does not do any SQL itself, but delegates it all to the Stores. We have three of these (PurgeEventsStorage, StateGroupStorage, EventsPersistenceStorage). Note that a lot of non-storage code skips this layer and goes straight to the <foo>Stores.

Discussed today: ...Storage does a remarkably poor job of conveying what these things are, and it would be nice to find a better name for them. But naming is hard.

One suggestion was ...StorageManager. (In which case, Storage itself would be a StorageManagerManager ?)

@richvdh
Copy link
Member Author

richvdh commented May 26, 2022

Discussed today: ...Storage does a remarkably poor job of conveying what these things are, and it would be nice to find a better name for them. But naming is hard.

One suggestion was ...StorageManager.

further ideas:

  • StateStorageController
  • StateStorageHelper

(In which case, Storage itself would be a StorageManagerManager ?)

further ideas:

  • Storage*Coordinator

@richvdh
Copy link
Member Author

richvdh commented May 26, 2022

Right, here are my proposals for the least bad names for these things. Speak now or forever hold your peace:

From To
synapse.storage.state.State[Group]Storage synapse.storage.controllers.state.StateStorageController
synapse.storage.purge_events.PurgeEventsStorage synapse.storage.controllers.purge.EventsPurgeStorageController
synapse.storage.persist_events.EventsPersistenceStorage synapse.storage.controllers.persistence.EventsPersistenceStorageController
synapse.storage.Storage synapse.storage.controllers.StorageControllers

🚀 for "this is the best naming convention ever"
👍 for "this is fine"
👎 for "this is no good", but you have to make a better suggestion.

@DMRobertson
Copy link
Contributor

Does the name StorageController imply that there will be some lower level entity roughly called a Storage?

How do StorageControllers relate to SQLBaseStore and its inheritors?

@squahtx
Copy link
Contributor

squahtx commented May 26, 2022

I'm okay with the proposal.

I agree that StorageController isn't great but I'm stuck coming up with an alternative.
What's the distinction between a Handler and a StorageController? To me, they both do higher-level operations involving access to Stores.

@richvdh
Copy link
Member Author

richvdh commented May 27, 2022

Does the name StorageController imply that there will be some lower level entity roughly called a Storage?

No, it's more of an abstract concept: "the storage system".

How do StorageControllers relate to SQLBaseStore and its inheritors?

I've tried to explain this in the opening comment; in short, a <Foo>Storage (now a <Foo>StorageController) is a higher-level thing which spans multiple <foo>Stores (of which SQLBaseStore is one).

What's the distinction between a Handler and a StorageController? To me, they both do higher-level operations involving access to Stores.

Yeah, great question. The distinction is very woolly, which maybe suggests this architecture isn't great. In my head, a Handler is higher level and contains more "business logic", whereas a StorageController is more focussed on orchestrating access to the storage. Handlers can call StorageControllers, but not vice versa.

.oO (hrm, ...StorageOrchestrator?)

@erikjohnston
Copy link
Member

I've implemented this at #12913

TBH, to me this doesn't feel like its made anything clearer (beyond making Storage equivalent class plural), with the disadvantage of long names. However, I'm happy to land it if people do feel like its a step in the right direction.

@DMRobertson
Copy link
Contributor

Related: #11165

(I semiregularly try to find that issue by searching for hella confusing and end up here.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

6 participants