Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrency: rollback locks sequence numbers on lock reacquisition for unreplicated locks #102269

Closed
arulajmani opened this issue Apr 25, 2023 · 1 comment · Fixed by #103493
Closed
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@arulajmani
Copy link
Collaborator

arulajmani commented Apr 25, 2023

Background

All operations performed by a transaction in CRDB have an associated sequence number, which are used to provide relative ordering for all operations. Savepoints are built on top of this concept -- whenever a savepoint is rolled back, associated sequence numbers are considered to be "ignored".

As of the time of writing, CRDB deviates from lock rollback compared to Postgres -- savepoint rollback does not result in lock rollbacks. This is because the client never issues an RPC to update any locks acquired under savepoints. However, the OnLockUpdate method on the LockManager, and the concurrency_control package in general, is designed to handle lock rollbacks.

Propopsal

Locks in the lock table store all sequence numbers at which they have been acquired. Calls to OnLockUpdate are passed a list of IgnoredSequenceNumbers. These are then used to prune the list of sequence numbers (holder.seqs) the lock is held at; if this list becomes empty, the lock is no longer held, and removed from the lock table.

In addition to the OnLockUpdate path, we should do this pruning on the lock (re-)acquisition path as well. This would entail teaching the arguments of OnLockAcquired (roachpb.LockAcquisition) to have access to a list of IgnoredSequenceNumbers. We could then make use of this list like we do in OnLockUpdate.

Jira issue: CRDB-27375

@arulajmani arulajmani added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team A-read-committed Related to the introduction of Read Committed labels Apr 25, 2023
@arulajmani arulajmani self-assigned this May 1, 2023
arulajmani added a commit to arulajmani/cockroach that referenced this issue May 4, 2023
When the lock table hears about lock acquisition at a newer epoch, it
no longer needs to track lock history from prior epochs. Prior to this
patch, however, we would only throw away lock history from prior epochs
for the durability with which we were acquiring the lock. This meant
that we could be needlessly tracking lock history for the other
durability from a prior epoch.

This patch changes this behavior -- we now get rid of all prior lock
acquisition history from the previous epoch for both replicated and
unreplicated locks. This change enables us to stop storing lock
histories based on durability, which is what is proposed in the shared
locks RFC.

With this behavior change, I had to rework some of our existing tests
that asserted the previous behavior. Some of the changes were
non-trivial, but I tried to make sure we continued to test what the
original test was intending to.

While here, we also add an assertion to ensure the epoch of a lock
holder never regresses. This codifies the requirement around epoch
monotonicity from a comment on the `lockHolderInfo` struct.

Informs cockroachdb#102269
Informs cockroachdb#91545

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue May 4, 2023
This patch extends the `roachpb.LockAcquisition` struct to include more
information -- namely the strength with which the lock is being acquired
and there are any ignored sequence numbers for the transaction acquiring
the lock.

For now, the strength is statically initialized to `lock.Intent`. We'll
generalize this as we go about supporting multiple lock strengths in the
lock table.

The `AcquireLock` codepath is then changed to accept a
`roachpb.LockAcquisition` struct. This doesn't change any functionality
yet, but will enable us to rollback savepoints on lock acquisition and
support multiple lock strengths in the lock table in upcoming patches.

Informs #cockroachdb#102269
Informs cockroachdb#102270

Release note: None
@arulajmani arulajmani changed the title concurrency: rollback locks sequence numbers on lock reacquisition concurrency: rollback locks sequence numbers on lock reacquisition for unreplicated locks May 5, 2023
@arulajmani
Copy link
Collaborator Author

@nvanbenschoten and I spoke about this at length today. Previously, the issue talked about pruning sequence numbers for both replicated and unreplicated locks on lock acquisition. However, this gets complex because of a few reason (that I'll describe below). So now we've changed our thinking to only rollback sequence numbers for unreplicated locks -- this is because the in-memory lock table is the source of truth for in-memory locks, not replicated locks.


Motivation

Locks can be acquired by a transaction multiple times, at multiple sequence numbers. Keeping track of the entire history of a lock's acquisition can thus grow unboundedly. Ideally, we would be able to compress this history while still preserving correctness.

The shared locks RFC (#101799) proposes that we only keep track of <strength, lowest sequence number (that hasn't been rolled back)> pairs. Put another way, the lock table can discard lock re-acquisitions at higher sequence numbers if the lock is held at the same strength at a lower sequence number, as long as that sequence number hasn't been rolled back.

Today, sequence numbers are only rolled back in tryUpdateLock. There is no guarantee we end up in this codepath after a savepoint rollback. In fact, it's quite common that we don't, given savepoint rollback doesn't try and eagerly resolve intents.

If we were not to consult ignored sequence numbers on the lock acquisition codepath, and instead blindly discard lock re-acquisitions at higher sequence numbers, we could (incorrectly) drop locks.

Challenges

While working on this, we discovered that lock re-acquisition for replicated locks (mvccPutInternal) does not filter ignored sequence numbers from its history; instead, it only decides not to move an older value into the IntentHistory if that value was written at a now rolled back sequence number. So rolling back is entirely "best effort" in that its incomplete. Relevant code:

cockroach/pkg/storage/mvcc.go

Lines 2051 to 2059 in 7c4867b

// Only add the current provisional value to the intent
// history if the current sequence number is not ignored. There's no
// reason to add past committed values or a value already in the intent
// history back into it.
if curProvValRaw != nil {
prevIntentValRaw := curProvValRaw
prevIntentSequence := meta.Txn.Sequence
buf.newMeta.AddToIntentHistory(prevIntentSequence, prevIntentValRaw)
}

This logic is impossible to replicate with what's proposed in the shared locks RFC. This means that the in-memory lock table's understanding of how a durable lock looks like can get out of sync. This can lead to all sorts of correctness bugs -- one can construct scenarios for bugs with infinite rediscovery loops or locks being incorrectly dropped (the original motivation for this issue).

Takeaways

All this points to the fact that keeping the in-memory state in sync with the replicated state (the source of truth) is fraught with subtle issues. We've seen this in other contexts as well. Instead of trying to work through and account for all these cases, we should be dumbed in the in-memory lock table. As a reminder to my future self, we should work under the principle that the in-memory lock table is just the source of truth for unreplicated locks and is dumb about replicated locks. That may mean that replicated locks might end up getting re-discovered by waiting requests -- not commonly, but that's fine.

With that in mind, the new proposal is to only filter the list of unreplicated locks by checking against ignored sequence numbers on the lock re-acquisition path. This then allows us to track just <strength, lowest sequence number (that hasn't been rolled back)> for unreplicated locks.

All this also means that we don't actually need to track sequence numbers for replicated locks in the in-memory lock table. Instead, we can just track whether a replicated lock exists or not. The way ignored sequence numbers are cleared for replicated locks is using a ResolveIntent request, which eventually cals into the in-memory lock table and into tryUpdateLock, where replicated locks are simply forgotten:

if lock.Durability(i) == lock.Replicated || txn.Epoch > holder.txn.Epoch {

So tracking sequence numbers for replicated locks in the in-memory lock table doesn't achieve anything.

arulajmani added a commit to arulajmani/cockroach that referenced this issue May 8, 2023
This patch extends the `roachpb.LockAcquisition` struct to include more
information -- namely the strength with which the lock is being acquired
and if there are any ignored sequence numbers for the transaction
acquiring the lock.

For now, the strength is statically initialized to `lock.Intent`. We'll
generalize this as we go about supporting multiple lock strengths in the
lock table.

The `AcquireLock` codepath is then changed to accept a
`roachpb.LockAcquisition` struct. This doesn't change any functionality
yet, but will enable us to rollback savepoints on lock acquisition and
support multiple lock strengths in the lock table in upcoming patches.

Informs cockroachdb#102269
Informs cockroachdb#102270

Release note: None
craig bot pushed a commit that referenced this issue May 9, 2023
101694: schemafeed: Add an optimization case where we don't wait on Peek or Pop r=Xiang-Gu a=Xiang-Gu

  This commit utilizes the recently introduced storage parameter
    "schema_locked" to achieve low-latency changefeed. Namely, we added logic to Peek/Pop to return "there is no table events" if the table schemas are locked without waiting for `highWater` to reach `atOrBefore`. 
    This change allows us to reduce event commit-to-emit latency drastically
    from `changefeed.experimental_poll_interval` (which defaults to 1s) to
    milliseconds.

Epic: CRDB-2258
Release note: None

102714: build: update nogo config to lint more generated code r=srosenberg,healthy-pod a=rickystewart

To date we have used a regex like `cockroach/pkg/.*/.*\\.go` to identify
first-party code, which does find our checked-in code, but does *not*
match generated code, which is not staged at that path in the Bazel
sandbox. Instead, these generated files are generally found in
`bazel-out/.../bin`. In places where we've set `only_files`, in addition
to the original regex, we add another regex to capture these generated
files.

Many packages are now (correctly) being flagged as missing package-level
comments. In these cases I have added a very short package-level
comment.

Closes #102191.

Epic: none
Release note: None

102770: kv: extend lock acquisition struct to include more information r=nvanbenschoten a=arulajmani

This patch extends the `roachpb.LockAcquisition` struct to include more information -- namely the strength with which the lock is being acquired and there are any ignored sequence numbers for the transaction acquiring the lock.

For now, the strength is statically initialized to `lock.Intent`. We'll generalize this as we go about supporting multiple lock strengths in the lock table.

The `AcquireLock` codepath is then changed to accept a `roachpb.LockAcquisition` struct. This doesn't change any functionality yet, but will enable us to rollback savepoints on lock acquisition and support multiple lock strengths in the lock table in upcoming patches.

Informs ##102269
Informs #102270

Release note: None

Co-authored-by: Xiang Gu <xiang@cockroachlabs.com>
Co-authored-by: Ricky Stewart <rickybstewart@gmail.com>
Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
arulajmani added a commit to arulajmani/cockroach that referenced this issue May 17, 2023
Lock re-acquisitions from requests that belong to prior epochs is not
allowed. For replicated locks, mvccPutInternal rejects such requests
with an error. However, for unreplicated locks where the in-memory lock
table is the source of truth, no such check happens. This patch adds
one. The comment on `lockHolderInfo.txn` field suggests this was always
the intended behavior; it was just never enforced.

While here, we also add an assertion to ensure a replicated lock
acquisition never calls into the lock table with a transaction belonging
to a prior epoch. We expect this case to be handled in mvccPutInternal,
so unlike the unreplicated case, this deserves to be an assertion
failure.

This patch also updates a test that unintentionally re-acquired a lock
using a txn from an older epoch. We also add a new test.

Informs: cockroachdb#102269

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue May 17, 2023
This patch adds logic to prune the list of sequence numbers tracked by
the lock table for unreplicated locks. This is done when some of the
tracked sequence numbers are considered ignored, by virtue of a
savepoint rollback.

Note that we only do so for unreplicated locks, and only if an
unreplicated lock is being reacquired. This is because the in-memory
lock table is only the source of truth for in-memory locks; the mvcc
keyspace is the source of truth for replicated ones. As such, trying
to mimic the logic is hard/error-prone -- so we don't attempt to do
so.

Fixes cockroachdb#102269

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue May 19, 2023
Lock re-acquisitions from requests that belong to prior epochs is not
allowed. For replicated locks, mvccPutInternal rejects such requests
with an error. However, for unreplicated locks where the in-memory lock
table is the source of truth, no such check happens. This patch adds
one. The comment on `lockHolderInfo.txn` field suggests this was always
the intended behavior; it was just never enforced.

While here, we also add an assertion to ensure a replicated lock
acquisition never calls into the lock table with a transaction belonging
to a prior epoch. We expect this case to be handled in mvccPutInternal,
so unlike the unreplicated case, this deserves to be an assertion
failure.

This patch also updates a test that unintentionally re-acquired a lock
using a txn from an older epoch. We also add a new test.

Informs: cockroachdb#102269

Release note: None
craig bot pushed a commit that referenced this issue May 19, 2023
103387: grpcutil: improve gRPC logging r=knz a=tbg

Extracted from #99191, where I ran a bunch of experiments to make sure the logging around connection failures is clean and helpful.

I did manual testing on the changes in this PR. It's tricky to test vmodule-depth-thingies programmatically and I don't want to spend time building new infra here.

- cli: don't lower gRPC severity threshold
- grpcutil: adopt DepthLoggerV2
- sidetransport: suppress logging when peer down

Release note: None
Epic: CRDB-21710


103492: kv: add assertion to prevent lock re-acquisition at prior epochs r=nvanbenschoten a=arulajmani

Lock re-acquisitions from requests that belong to prior epochs is not allowed. For replicated locks, mvccPutInternal rejects such requests with an error. However, for unreplicated locks where the in-memory lock table is the source of truth, no such check happens. This patch adds one. The comment on `lockHolderInfo.txn` field suggests this was always the intended behavior; it was just never enforced.

While here, we also add an assertion to ensure a replicated lock acquisition never calls into the lock table with a transaction belonging to a prior epoch. We expect this case to be handled in mvccPutInternal, so unlike the unreplicated case, this deserves to be an assertion failure.

This patch also updates a test that unintentionally re-acquired a lock using a txn from an older epoch. We also add a new test.

Informs: #102269

Release note: None

103533: allocator2: fix normalization for the no constraint case r=kvoli a=sumeerbhola

Informs #103320

Epic: CRDB-25222

Release note: None

103579: server: add comment pointing out TestGraphite slowness r=aliher1911 a=tbg

Epic: none
Release note: None


103581: server,rpc,circuit: assortment of small cleanups/improvements r=knz a=tbg

Extracted from #99191.

- server: fix a (harmless) test buglet
- rpc: small comment tweak
- rpc: fix incorrect comment
- circuit: provide type for the anonymous Signal interface
- rpc: rename a package
- rpc: minor Goland lint appeasement
- rpc: avoid shadowing `backoff` package
- rpc: make some unused args anonymous
- rpc: make it clear that TestStoreMetrics restarts stores

Release note: None
Epic: None

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
craig bot pushed a commit that referenced this issue May 19, 2023
103493: concurrency: rollback ignored sequence numbers on lock acquisition r=nvanbenschoten a=arulajmani

This patch adds logic to prune the list of sequence numbers tracked by
the lock table for unreplicated locks. This is done when some of the
tracked sequence numbers are considered ignored, by virtue of a
savepoint rollback.

Note that we only do so for unreplicated locks, and only if an
unreplicated lock is being reacquired. This is because the in-memory
lock table is only the source of truth for in-memory locks; the mvcc
keyspace is the source of truth for replicated ones. As such, trying
to mimic the logic is hard/error-prone -- so we don't attempt to do
so.

Fixes #102269

Release note: None

103617: util/parquet: refactor random testing types r=miretskiy a=jayshrivastava

This change refactors randomized testing to use `randgen.RandType`. `randgen.RandType` is better as it takes into account all allowable types which can appear in CRDB (ex. array of tuple). The previous code only generated random types which are supported by the writer which leaves a gap when new types are added. Now, the code defaults to all types and filters out unsupported ones.

Informs: #99028
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-15071
Release note: None

103668: roachtest: get debug zip with --include-range-info r=renatolabs a=tbg

Noticed in #103552 that we are
missing the ranges.jsons, which is sad.

This flag is new as of #102289.

Epic: none
Release note: None


Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
@craig craig bot closed this as completed in 9e78b0a May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant