Rework the storage framework to work nicely with PEP 249 #9295

ShadowJonathan · 2021-02-02T14:09:26Z

This is a "maybe" issue, but some elements of PEP 249 have already been implicitly coded into the codebase, such as synapse.storage.types, however, it could be easier to directly work with PEP 249's DB-API-v2 schema and approach to simplify the storage framework even further, and make (potential) future database engines easier to support.

(Things such as module .threadsafety properties can maybe help futher granularize database pooling mechanisms dynamically.)

The text was updated successfully, but these errors were encountered:

richvdh · 2021-02-02T14:50:05Z

what would this actually involve, in your mind? The codebase assumes database APIs which comply PEP249, as both psycopg2 and sqlite do.

ShadowJonathan · 2021-02-02T15:03:45Z

More directly work with PEP249 interfaces which's quirks are caught in the database engine modules, instead of embedding the quirks all over the place.

Basically: Only consume PEP249 APIs, specialise on database type, and the database driver needs to be invisible.

Ofc only if this is possible with the way the framework is set up.

The pros of this will be that it's incredibly easy to work with multiple database engine types, and possibly multiple SQL database types, but that's far-off.

clokep · 2021-02-02T17:21:25Z

I'm still not sure what concrete changes are proposed here -- can you clarify what "storage framework" means to you? (Is it a particular class? A particular module?)

ShadowJonathan · 2021-02-02T17:48:35Z

Ah sorry, it's quite ambiguous, but i mean the target area of every part of synapse that either;

Holds a reference to an *Engine, Connection, Cursor or Transaction object (to keep those parts of the codebase in mind)
Directly works on aforementioned objects with SQL statements
Needs to use or utilises certain quirks from database drivers and directly or indirectly invokes them by passing values that end up being used by those drivers.

With "storage framework" i meant the entirety of synapse.storage, but i have a feeling even that module-based scope might not be good enough, but we'll see.

richvdh · 2021-02-02T19:27:12Z

can you give us some examples of what would change here? As far as I'm aware almost all of the code you mention already uses PEP-249-compliant interfaces.

ShadowJonathan · 2021-02-02T19:33:25Z

I saw quite a lot of .set_session usage when going over changes needing to be made for #9299, .set_session is a psycopg2-specific method on Connection objects.

It's about the prospect these were used regardless of aforementioned supposed compliance in the codebase, which makes me think there's much more uncareful usage of adapter quirks like that, i think much more subtle usages like these exist such as in MultiWriterIdGenerator or the like, and postgres-specific replication workers (which have had the privilege of never worrying about another database adapter than psycopg2), that's what im referring to.

This is all in anticipation for pg8000, yes, but i think it's also helpful to clean up the codebase in general to prevent bugs from popping up in the future due to a driver-implementation-specific feature, regardless.

erikjohnston · 2021-02-04T11:38:15Z

Right, yes we do use some extensions when using psycopg2 for performance reasons, and those tend to be gated behind a check to make sure we're using the PostgresEngine, rather than the SQLiteEngine. At the moment there isn't a distinction between connecting to a particular database (and thus being able to use non standard SQL features) vs using a particular connector (and thus being able to use particular connector features). It's absolutely possible to split those concepts up, but that obviously adds complexity.

~~Could this issue be better described as adding support for pg8000 library and/or other sqlite and postgres adaptors?~~ Ah, I've just seen #9294, which is basically that.

ShadowJonathan · 2021-02-04T15:02:16Z

It's absolutely possible to split those concepts up, but that obviously adds complexity.

I can counter this with 4 lines from python zen;

Simple is better than complex.
Complex is better than complicated.

Readability counts.

In the face of ambiguity, refuse the temptation to guess.

While i'm not going to hold this up as the SSOT in regards to how we should approach this, it is how I look at most projects, and right now there is both ambiguity with PostgresEngine as both an engine for psycopg2 and "the" postgres engine, so I want to split these two concepts up and make both explicit, so that in the future it is clear what is happening at any one time.

psycopg2-speedups can stay psycopg2-speedups, but it should be clear that they're psycopg2-specific, not postgres-specific, with which there is ambiguity in many places in the codebase at the moment.

ShadowJonathan · 2021-02-04T15:05:08Z

Extra note; Don't just think i'm just doing this for pg8000, I just found asyncpg, an asyncio-based postgres database driver which claims to be 3x faster than psycopg2, which might be worth looking into, so a generic database interface would help wonders maintaining compatibility while testing drivers like these for speedups and extra performance.

callahad · 2021-04-22T15:30:38Z

I had a chance to discuss this with the Synapse platform team today.

We concluded that we are unlikely to take up this proposal: the necessary refactoring would be fairly invasive, and thus it would only be justified if it would address a specific, identified need. Until then, we are unfortunately resigned to conflating "Postgres" with "psycopg2".

We wouldn't immediately reject a pull request which provides better abstraction, but it would face a particularly skeptical review.

Example factors which could affect this conclusion:

The psycopg2 driver entering end-of-life
Explicit measurements showing that the database driver was a performance bottleneck and that another driver would solve this
Element determining that supporting another RDBMS is necessary for its viability as a business

callahad · 2021-04-22T15:34:55Z

(I'm happy to leave this open as a P5, but I'd need a new title which more clearly encapsulates a specific task that this issue represents)

ShadowJonathan · 2021-04-22T15:46:04Z

Explicit measurements showing that the database driver was a performance bottleneck and that another driver would solve this

I currently believe this is the case, as you said in #9294 (comment), before re-opening this - once i have enough time to get around to it - i'll try to prove it, maybe with the load test. Thanks for taking the time to think about this, though.

ShadowJonathan mentioned this issue Feb 2, 2021

Detail typing of Connection and Cursor to further match PEP 249 #9299

Merged

4 tasks

erikjohnston added T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. X-Needs-Discussion and removed X-Needs-Discussion labels Feb 4, 2021

callahad removed the X-Needs-Discussion label Apr 22, 2021

callahad closed this as completed Apr 22, 2021

callahad mentioned this issue Apr 22, 2021

[umbrella] Adding pg8000 as a database driver #9294

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the storage framework to work nicely with PEP 249 #9295

Rework the storage framework to work nicely with PEP 249 #9295

ShadowJonathan commented Feb 2, 2021

richvdh commented Feb 2, 2021

ShadowJonathan commented Feb 2, 2021

clokep commented Feb 2, 2021

ShadowJonathan commented Feb 2, 2021 •

edited

Loading

richvdh commented Feb 2, 2021

ShadowJonathan commented Feb 2, 2021 •

edited

Loading

erikjohnston commented Feb 4, 2021 •

edited

Loading

ShadowJonathan commented Feb 4, 2021 •

edited

Loading

ShadowJonathan commented Feb 4, 2021 •

edited

Loading

callahad commented Apr 22, 2021

callahad commented Apr 22, 2021

ShadowJonathan commented Apr 22, 2021

Rework the storage framework to work nicely with PEP 249 #9295

Rework the storage framework to work nicely with PEP 249 #9295

Comments

ShadowJonathan commented Feb 2, 2021

richvdh commented Feb 2, 2021

ShadowJonathan commented Feb 2, 2021

clokep commented Feb 2, 2021

ShadowJonathan commented Feb 2, 2021 • edited Loading

richvdh commented Feb 2, 2021

ShadowJonathan commented Feb 2, 2021 • edited Loading

erikjohnston commented Feb 4, 2021 • edited Loading

ShadowJonathan commented Feb 4, 2021 • edited Loading

ShadowJonathan commented Feb 4, 2021 • edited Loading

callahad commented Apr 22, 2021

callahad commented Apr 22, 2021

ShadowJonathan commented Apr 22, 2021

ShadowJonathan commented Feb 2, 2021 •

edited

Loading

ShadowJonathan commented Feb 2, 2021 •

edited

Loading

erikjohnston commented Feb 4, 2021 •

edited

Loading

ShadowJonathan commented Feb 4, 2021 •

edited

Loading

ShadowJonathan commented Feb 4, 2021 •

edited

Loading