Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that transactions are wrapped asynchronously in the parent database object instead of blocking on transactions' drop function. #1223

Merged
merged 5 commits into from
Sep 21, 2023

Conversation

TheQuantumPhysicist
Copy link
Collaborator

This merges to the postgres-async branch and fixes the issue that transactions that aren't ended explicitly would block the thread until they do that.

Now with this solution, transactions that aren't finished will simply use their Drop trait to send back themselves to the parent, owning database object, which will take care of decommissioning the transaction asynchronously

…abase object instead of blocking on transactions' drop function.
let tx_dropper_joiner = tokio::task::spawn(async move {
let mut conn_rx = conn_rx;
while let Some(conn) = conn_rx.recv().await {
conn.batch_execute("ROLLBACK").await.unwrap_or_else(|e| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the documentation, maybe batch() rather than batch_execute() is better when only doing a single query?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_execute() is what was used in postgres_async::Transaction. It seems the right thing to do. Not sure.

>,
// Note: This exists to enforce that a transaction never outlives the database object,
// given that all connections have 'static lifetimes
_marker: std::marker::PhantomData<&'a ()>,
}

impl<'a> Drop for ApiServerPostgresTransactionalRw<'a> {
fn drop(&mut self) {
if !self.finished {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since getting a connection has been change from get() to get_owned(), should we always send() the connection back to the poll regardless of finished?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to send the connection back if finished, because it's already committed/rolled back. It should be destroyed so that the database object retrieves the connection again.

@alfiedotwtf
Copy link
Contributor

Nice. Once again channels saves the day.

I think this should be a rule of thumb now - whenever something seems impossible, consider using channels to solve the problem.

@alfiedotwtf
Copy link
Contributor

One last thought - too late now given you've implemented everything already and it works, but tokio_postgres::Transaction can possibly do a lot of the transactional heavy lifting.

@TheQuantumPhysicist
Copy link
Collaborator Author

TheQuantumPhysicist commented Sep 20, 2023

Nice. Once again channels saves the day.

I think this should be a rule of thumb now - whenever something seems impossible, consider using channels to solve the problem.

Channels aren't really the solution here, as they always were part of the attempts. What saved the day really was get_owned, because if the connections had a lifetime associated with anything less than static, none of this would've worked, since tokio tasks enforce that.

@TheQuantumPhysicist
Copy link
Collaborator Author

One last thought - too late now given you've implemented everything already and it works, but tokio_postgres::Transaction can possibly do a lot of the transactional heavy lifting.

Using Transaction is not really possible because it's a reference to connection, which will lead to a self-referential transaction object in our struct, which is a no-no in rust. In fact this is how it was done in the sync version of the code and why CI isn't passing there (because I wanted to get a comment from reviewers on whether Ouroboros lifetimes of connections are covariant), but then I realized that I can circumvent all that complexity by getting rid of Transaction and creating them myself, which is what I did here. Without that solution, there would've been issues because Ouroboros doesn't allow async borrows of self-referential variables.

Comment on lines +44 to +47
impl Drop for TransactionalApiServerPostgresStorage {
fn drop(&mut self) {
// Since the whole connection pool will be destroyed, we can safely abort all connections
self.tx_dropper_joiner.abort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is abort really the clean thing to do here? If there is some connection waiting to be rolled back in the channel and the dropper task is aborted, the transaction is not going to be finalised. Or did I miss something?

Copy link
Collaborator Author

@TheQuantumPhysicist TheQuantumPhysicist Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. The lower level connection will be dropped since the connection pool will be destroyed, and hence the session will be dropped, and hence the transaction will be aborted anyway. At least this is my understanding of how postgres and mysql work from my research into this problem, but I'm open to being corrected here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if it's so that sounds acceptable, albeit still not the cleanest. If I understand it correctly, some transactions may not be explicitly rolled back by issuing the ROLLBACK query, relying on the connection cleanup procedure to effectively abort it. Is that about right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm not sure whether there's a misunderstanding, so let me elaborate.

This is exclusively at the end of life of the database, which is probably obvious.

So, IF we're shutting down the system (hence the database object, not a transaction object) is about to be dropped, and there are connections that were transactions and are still not rolled back (even though their transaction objects are destroyed, by definition, because a transaction cannot outlive the database given the lifetime constraint with that phantom data), then aborting will, yes, cause the connection cleanup to effectively rollback the transaction, because the sessions of the connection pool will be ended, which cancels all low-level postgres transactions that were open.

Having said that, again, this depends on my understanding of how database connections/sessions work and based my research on this problem. I'm happy to be challenged on that.

Now the alternative is that we have to block when shutting down the database. I'm happy to do that if it turned out to be necessary... in fact it's like a line of code in tokio, but I consider it less clean because of my eternal fear of deadlocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the alternative is that we have to block when shutting down the database. I'm happy to do that if it turned out to be necessary... in fact it's like a line of code in tokio, but I consider it less clean because of my eternal fear of deadlocks.

That could be mitigated by using spawn_blocking for the cleanup task but then you won't be able to use the single-threaded tokio runtime. It's already the case in mempool because of the blocking subsystem handle business but it's not great and I'd like to avoid that if possible.

Base automatically changed from fix/postgres-async to fix/postgres September 21, 2023 23:29
@TheQuantumPhysicist TheQuantumPhysicist merged commit 89b61f1 into fix/postgres Sep 21, 2023
12 checks passed
@TheQuantumPhysicist TheQuantumPhysicist deleted the fix/postgres-async-with-tx-dropper branch September 21, 2023 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants