-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: ROLLBACK TO SAVEPOINT does not release locks #46075
Comments
@nvanbenschoten @tbg @andreimatei please comment on this issue ASAP about whether we're overseeing anything. For now this issue is a release blocker. |
There's no difference between intents and unreplicated locks (SFU) from the perspective of savepoints. Neither are rolled back by a savepoint. Both continue blocking a reader or writer after the rollback to savepoint. I think that when we can, it'd be good to release locks early, but I don't think it's nearly as important as you do. I find it hard to believe that an application would rely on this behavior. As far as I can find, Postgres does not document this behavior (and in general locking is not something that's documented well). How would one code such an application? It'd need to rollback to a savepoint in one transaction, and afterwards kick off work on a different connection while blocking the original transaction on that work? My guess is that the Hibernate test is synthetic; I'd like to see it. |
While we investigate I think it's reasonable to add the error message "this is not supported" until we get clarity. It would be a mistake to let something behave in a way we're not confident about, let users get used to it and start depending on, then later change it. |
"This" being a rollback in any transaction that has written anything? |
that has locked anything with in-memory locks I suppose? |
Well but as I was telling you, intents have just the same effect as the unreplicated locks. |
My proposal would be to add a docs item to call out this difference in behavior. |
As a side note, @andreimatei, the locking rollback behavior is documented by PG:
|
I'm confused about the sudden urgency of this. We discussed the question of whether ROLLBACK TO SAVEPOINT should eagerly release locks back in October and all decided that for the initial implementation, we wouldn't eagerly release locks on ROLLBACK. Why the sudden change? As others here have mentioned, this is orthogonal to SFU and unreplicated locks. Unreplicated locks support partial rollback based on Andrei and I actually discussed this a few days ago. Everything below the KV client supports partial rollbacks of locks, so eagerly releasing them on rollback using
That said, I'm now thinking that there might be a different approach that would avoid the |
Maybe this is whether the confusion is coming from. This statement is incorrect. |
Thanks everyone for the discussion. I think we can hand over the handling of this to @rmloveland and ensure that cockroachdb/docs#6675 gets augmented to properly cover this topic. I also would like to ensure that @awoods187 propagates this newfound knowledge/understanding to the SQL and AppDev teams. @awoods187 can you carry this over? |
Did we confirm that the Hibernate test is not modeling a behavior that users may intend to take advantage of? I.e. is it testing this behavior on purpose or by accident? Regardless of the above, it seems reasonable to proceed with this approach for 20.1. We can then gather more information in the 20.2 cycle about prioritizing eagerly releasing these locks (including factoring in the answer to the above question). |
Definitely on purpose.
The hibernate test is constructed with a particular purpose, but that's not directly relevant here: Andy K and others have stated that regardless of user impact, we're going to keep the current CockroachDB behavior and document it as a known divergence. That's an "architectural divergence" if you recall the terminology. The next step here is to make it crystal clear to users that it's going to be different from pg and they must adapt their client code if they observe it's an issue. |
@nvanbenschoten @andreimatei I am adding the following text to the RFC and I intend that change to the RFC (as well as merging the RFC PR) to also close htis issue as a result. Relationship with row locksCockroachDB supports row locks: both write locks (write intents) and,
This is an architectural difference which we don't plan to lift any The motivation for preserving locks is to increase the chance that the Client code that rely on row locks in client apps must be reviewed |
As per Nathan's wrath, we don't have "write locks" and "read locks". We have "replicated" and "unreplicated" exclusive locks.
Well, that's the motivation for not releasing locks when a retriable error is encountered, like you say. The motivation for not releasing locks generally upon a |
Thanks for writing that up Raphael. I expect that some of that language will also end up in the docs.
This makes it sound like there's a fundamental reason why the CockroachDB architecture cannot remove this difference. I think that's too harsh. If we had another month in this release then we could absolutely address this using either of the approaches discussed earlier in this thread.
For reference, here are the lock strength and durability definitions. Like you mentioned, we currently support only replicated and unreplicated exclusive locks. In the future, we will likely change the locks acquired by implicit and explicit SFU to be "unreplicated upgrade" locks, which will improve performance on workloads like YCSB-B (95% read, 5% upgrade, zipfian distribution). |
Changed the phrasing to: CockroachDB supports exclusive row locks. (They come in two variants -
This is an architectural difference in v20.1 that may or may The main motivation for preserving locks is to increase the chance Client code that rely on row locks in client apps must be reviewed WDYT? |
That sounds good to me. |
I still take objection to this phrasing :). This paragraphs claims that a very specific situation in which we like holding locks is the general reason why we don't roll them back. That's simply not true, and worse, it's confusing cause it just leads to obvious questions. We could easily separate the case when we want the locks from the general case where we don't. The problem with releasing locks in the general case where we would like to release them is that it's not so easy to do (at least I don't think it is). Nathan was suggesting above that perhaps instead of having the kvclient be responsible for releasing locks on rollback, it'd be pushers that become able to push some locks out of their way. I think that'd work for intents, but I'm not sure how it'd work for the non-replicated locks - the reads that take these locks don't currently have sequence numbers so we'd need to change that. Also needing to do better tracking of locks seems like it'd limit our ability to merge locks together, etc. |
Look guys the situation is simple:
I don't care what the explanation is. If you think the best explanation is "we plan to do this, but it's not done yet", it's good. If you think the best explanation is "we can't do this, it's really hard, just deal with it", it's also good. I just need some words that y'all are happy with. Instead of bashing my draft, give me a counter proposal. Thank you. |
Well you're insisting on writing something here :). Find a way to summarize the discussion. It takes some time, that's why I didn't just do it... How about Instead of:
" |
Thanks! I have updated the RFC with this text, split in two parts:
|
tldr: The new implementation of savepoints and SELECT FOR UPDATE (row locks) are currently behaving in a way incompatible with postgres. This is in direct violation to our commitment to pg compatibility, and introduces the risk of application-level deadlocks.
This issue is to ensure ROLLBACK TO SAVEPOINT either release locks, like in pg, or fail with an "unimplemented error".
This has been discussed with @tbg
Details
There are two problems with this:
Nuance: rollback after a serialization error ("retry error")
The current crdb behavior is motivated as follows: after a retry error, keeping the locks active during ROLLBACK TO SAVEPOINT increases the chances that the new txn attempt will succeed.
However the current implementation keeps the locks unconditionally, causing the pg compat problem described above.
What we can do instead is the following:
Nuance: implicit locks under UPDATE statements
The implementation places locks for UPDATE statements. Naively preventing ROLLBACK TO SAVEPOINT after locks have been placed would block the ability to do ROLLBACK TO SAVEPOINT after an UPDATE statement. This would strongly limit the savepoint feature.
However, looking at the detail: the locks laid by an UPDATE statement are subsequently converted to write intents. If our understanding (knz+@tbg) is correct, there is no lock left over after the UPDATE statement completes (assuming no txn conflict yielding a retry error). Intents can be rolled back without issue with ROLLBACK TO SAVEPOINT.
The text was updated successfully, but these errors were encountered: