Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

[umbrella] Adding pg8000 as a database driver #9294

Closed
5 tasks
ShadowJonathan opened this issue Feb 2, 2021 · 3 comments
Closed
5 tasks

[umbrella] Adding pg8000 as a database driver #9294

ShadowJonathan opened this issue Feb 2, 2021 · 3 comments
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@ShadowJonathan
Copy link
Contributor

ShadowJonathan commented Feb 2, 2021

Using pg8000, a pure-python postgres database driver, will make it possible to use pypy in a performant manner, as pypy has poor interaction with it's C-API.

This issue is thus a sub-issue of #8888, and tracks progress in integrating pg8000 into the codebase.

TODO

  • Pre: Generalize the current database drivers further to minimise engine-specific behaviour (Generalize the current database drivers further to minimise engine-specific behaviour #9298)
    • Adding in BaseDatabaseEngine.sql_type property to if-else based on "sqlite" or "postgres"
  • Pre: Rename PostgresEngine to PsycoPg2Engine (#XXXX)
  • Add Pg8000Engine (#XXXX)
  • Post: Add "postgres" "virtual" database name to switch to psycopg2 on cpython, and pg8000 on pypy (#XXXX)
  • Post: Add with engine.new_conn() as conn:-esc context manager usage everywhere, to ensure proper closure of cursors, transactions, and connections in pypy (#XXXX)
@erikjohnston
Copy link
Member

erikjohnston commented Feb 4, 2021

Using pg8000, a pure-python postgres database driver, will make it possible to use pypy in a performant manner, as pypy has poor interaction with it's C-API.

I'm slightly surprised that it'd be faster than the other connectors, given its written in python. Do you have benchmarks or blog posts or something to show its the case?

@erikjohnston erikjohnston added T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. X-Needs-Discussion labels Feb 4, 2021
@ShadowJonathan
Copy link
Contributor Author

ShadowJonathan commented Feb 4, 2021

On pypy, pg8000 is faster than psycopg2cffi when placed next to eachother, due to that poor C-API interaction I mentioned earlier.

This comes from some rough tox -e py36 vs tox -e pypy3 test runs i did in #9123:

using psycopg2(cffi): (-j 3)
py36-postgres: Ran 1501 tests in 706.022s
pypy3-postgres: Ran 1501 tests in 2548.308s

using pg8000: (-j 3)
pypy3-postgres: Ran 1501 tests in 763.756s

I don't have blogposts specifically for pg8000, but it's a matter of principle; pypy in general has a JIT in it's central premise, which speeds up pure python to machine code, and this blog post talks a bit about why pypy has poor performance with C APIs in general (it mentions CFFI being "better", but it's not on-par with performance it should be able to achieve, it's still not fully optimised).

@callahad
Copy link
Contributor

Similar to #9295 (comment), right now we're willing to accept that "postgres means pyscopg2" in Synapse, and we're not keen to explore other drivers until there's a clear motivating need.

In this case, I'm pretty surprised by the psycopg2cffi timings, and suspect that our test suite is not producing representative results. As PyPy is technically unsupported by Synapse, we're not inclined to make invasive changes to enable that support.

I'd be curious if psycopg2cffi fared better in something more closely approximating real-world conditions (e.g., maybe a load test using https://github.com/hpi-schul-cloud/synapse-load-test?)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

3 participants