-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
articles/2021/01/17/pool-design/ #4
Comments
Some feedback received consider killing the program a default behaviour too aggressive and not the pool's role, which I appreciate. In this case probably, instead of having a |
I have used connection pools in other languages, and psycopg2 for Python, but I've never had any need for connection pools in Python. That said, I'm happy to see this work, and here are my thoughts. Connection configuration: Subclassing feels wrong here. In my own applications, there is always a layer that I write between the application and the driver (or pool), because I often find myself wanting to add behavior in different ways. Registering connections, pre/post-processing around calls, etc. So I would do MyAdapter.register(conn) (for example) in my own code. Starting down the road of supporting sublcassing, or hooks via decorators, seems like a large extension to the interface that needs to be documented, and its always going to be useless for some people, and the wrong abstraction for others. Exclusive connections: I agree with your conclusion, not needed. Connection failure: The connection pool cannot insulate an application from connection problems. Even aside from catastrophic database failures, a connection can throw an exception for a variety of reasons, and the application has to be written to deal with it. A temporary failure will typically cause the current transaction to fail, and the application writer should know how to deal with this (replay the logic since the transaction began, ask user to try again, etc.) The application must also deal with the case that a connection cannot be obtained, e.g. the database is down. Again, I don't see what the pool can do that would be valuable in masking a failure for any amount of time. API:
|
Another possible class of failure under connection failures is when some connections are established but others are unable to, e.g. due to exceeding On the backend, it might be nice to have a way to tune how aggressive it is at trying to get to |
I am aware: I try to avoid API-by-subclassing if possible. However connection configuration is something you would do at connection creation, not on |
Well, it can't insulate it completely. But it in the modern world of RDS/Aurora-style systems that throw connection curveballs when failing over or scaling, this is by far the most important aspect of a Pool, imho. That and allowing me to mash the database with multiple async queries when I feel the need. |
The pool manager could exploit usage patterns if the application provided a label when it requests a connection. On returning a connection, the pool manager could indicate if the connection already had the requested label or if it was a fresh connection. Labelling could also be used as a basis for management of connection configurations. |
I'm the author of Connection losses are rare and why do you even bother detecting them early? It is quite possible that a connection loss occurs while a connection is being pooled so you gain basically nothing by having a more complex system in place. I would recommend you just put the connection back in the pool regardless of its current state. Upon next retrieval you can check the health of the acquired object and if it is unusable simply throw it away and fetch the next object from the pool or create a new connection if the pool is currently empty. The actual logic in When recycling a connection you have multiple ways to check the health and make sure the connection is usable. See https://docs.rs/deadpool-postgres/0.7.0/deadpool_postgres/config/enum.RecyclingMethod.html It's a simple, elegant and performant solution to a problem that's often solved in convoluted and over engineered ways. |
Hi @bikeshedder thank you for your input: I will definitely check out your implementation for ideas. Checking that a connection is broken on return is cheap, because it's an information already on the client side. Checking for usability on request is expensive because it requires at minimum a ping to the server, which slows down the client asking for it. Both are simple to implement, but for the way this pool is designed it prefers to spend more time when returning a connection then when giving it, because returning a connection can do some work in a separate thread. From your description of what deadpool does, I understand that it differs from what implemented in psycopg3 pool. In this pool, the need to create a new connection never blocks the client requesting it: if connection creation is slow, it can easily happen that a client would return an existing connection before the new one is created (see HikariCP analysis). So the strategy implemented here is to prepare a new connection in a parallel thread and add it to the pool when ready, regardless of whether the requesting client is still waiting for it or it has been already served. A more complex strategy to implement would be to have a background worker to periodically check the state of the connections in the pool. Thinking about it, I don't think it's really worth to have it, especially now that the pool usage is round-robin so connections are used uniformly. Unless the check is ridiculously aggressive, the chances that it detects a broken connection before a client does are slim, even in a moderately used pool. The client has to cope anyway with the possibility that a connection breaks in any moment during its life, a connection might break right after it's been served, so doing work to check for its validity before giving it out seems something that would slow down the requester for not much gain in robustness. What I'm thinking to do is rather to trigger a check for integrity of all the connections inside the pool whenever a connection is returned broken, because that's something that can be done in parallel. |
I agree with previous commenters to lose the subclass. Having two ways to do the same thing in an API slows me down -- why are there two ways? What's the tradeoff between the two methods, are they somehow different. Would it be possible to have values like minconn et. al. be properties? So that as an application goes through different stages of the lifecycle they can be adjusted. I wouldn't do that if it would make the code too complicated or less robust. |
@Gerardwx probably we will only document the callback way of doing things: it is probably good enough to define clearly what are the extension points defined by the system. Probably they will be:
As of scaling the pool size, it is already implemented: there is a |
Yes please! We have a large number of small databases that are user-specific and a small number of large databases that we use in a pooled fashion. Requiring that all connections are pooled means really ugly workarounds in user code. My applications generally have two or three pools that are connected to shared databases but make one-off connections to customer-specific databases occasionally. The "one-off" connections are not pooled. If psycopg3 includes a sane API for creating a one-off connection in a context manager, then this is not a problem. FWIW, I ended up implementing a context manager that create a pool with a min/max size of 1 using the user-specific DSN, fetching the only connection and yielding it. When the context manager exited, I disposed of the pool. Luckily the pool had very little logic behind it so creating and destroying a pool instance inside of context manager wasn't a performance headache. In this case, the new pool instance would spin up management tasks for a single connection before immediately closing them all. I would really rather avoid that. |
@dave-shawley what is a one-off one-connection pool useful for? |
@dvarrazzo I think that being able to create a connection using
I was initially concerned that the pool abstraction was going to be the only way to get a connection. Sorry for the noise. |
Designing a connection pool for psycopg3 — Psycopg
Python adapter for PostgreSQL
https://www.psycopg.org/articles/2021/01/17/pool-design/
The text was updated successfully, but these errors were encountered: