kvserver: allow certain read-only requests to drop latches before evaluation #85993

aayushshah15 · 2022-08-11T20:26:26Z

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since #76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the IntentHistory for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an intentInterleavingIterator. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
IntentHistory (which lives in the lock-table keyspace), and doesn't need to
use an intentInterleavingIterator.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves #66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap

cockroach-teamcity · 2022-08-11T20:26:34Z

This change is

aayushshah15 · 2022-08-11T20:32:11Z

cc @nvanbenschoten and @arulajmani

I'm working through the testing and the test failures, but this should still benefit from a high-level pass before that's done.

nvanbenschoten

Reviewed 6 of 14 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)

pkg/kv/kvserver/replica_evaluate.go line 625 at r1 (raw file):

	// conflicting requests during their execution. Thus, they cannot release
	// their latches before evaluation.
	if ba.IsLocking() || !ba.IsReadOnly() {

nit: do we need to check IsReadOnly? We're only calling this from the read-only path.

pkg/kv/kvserver/replica_evaluate.go line 625 at r1 (raw file):

	// conflicting requests during their execution. Thus, they cannot release
	// their latches before evaluation.
	if ba.IsLocking() || !ba.IsReadOnly() {

nit: move ba.IsLocking() this down right above the WaitPolicy check.

pkg/kv/kvserver/replica_evaluate.go line 643 at r1 (raw file):

		return false
	default:
		return false

nit: here and below, panic on unexpected enum values instead of failing silently.

pkg/kv/kvserver/replica_evaluate.go line 662 at r1 (raw file):

		case *roachpb.ExportRequest:
		case *roachpb.GetRequest:
			if inner.(*roachpb.GetRequest).KeyLocking != lock.None {

We shouldn't need this now that we're checking IsLocking().

pkg/kv/kvserver/replica_evaluate.go line 666 at r1 (raw file):

			}
		case *roachpb.ScanRequest:
			if inner.(*roachpb.ScanRequest).KeyLocking != lock.None {

Same comment, this shouldn't be needed now.

pkg/kv/kvserver/replica_read.go line 83 at r1 (raw file):

		return nil, g, nil, roachpb.NewError(err)
	}
	// TODO(nvanbenschoten): once all replicated intents are pulled into the

We can delete this now 🥳

pkg/kv/kvserver/replica_read.go line 167 at r1 (raw file):

		pErr = r.handleReadOnlyLocalEvalResult(ctx, ba, result.Local)
	}
	if !canDropLatches {

Can this be if g != nil. That would mean that latches are still held.

pkg/kv/kvserver/replica_read.go line 231 at r1 (raw file):

) {
	var ec endCmds
	ec, g = endCmds{repl: r, g: g, st: st}, nil

I don't think the assignment to g is doing the right thing anymore, now that this is in a helper function. You'll need to lift that back up into executeReadOnlyBatch after each call to this function.

pkg/kv/kvserver/replica_read.go line 266 at r1 (raw file):

needIntentHistory

This is a confusing condition, especially because we're returning true when canDropLatches is false. What is the intent history in the context of a read-only request? Why do we need it for non-txn requests?

Consider changing the condition to either needsIntentInterleaving or stillNeedsIntentInterleaving (depending on whether you want to return true or false when !canDropLatches.

pkg/kv/kvserver/replica_read.go line 266 at r1 (raw file):

	g *concurrency.Guard,
	st kvserverpb.LeaseStatus,
) (canDropLatches, needIntentHistory bool, pErr *roachpb.Error) {

nit: s/canDropLatches/ok/, because this is already the name of the method.

pkg/kv/kvserver/replica_read.go line 275 at r1 (raw file):

		ctx, 3, "can drop latches early for batch (%v); scanning lock table first to detect conflicts", ba,
	)
	maximalLatchSpan := g.LatchSpans().BoundarySpan(spanset.SpanGlobal)

Discussed in person: this isn't quite right. We don't want to scan across the entire boundary span, just the spans of the individual requests.

pkg/kv/kvserver/replica_read.go line 278 at r1 (raw file):

	start, end := maximalLatchSpan.Key, maximalLatchSpan.EndKey
	maxIntents := storage.MaxIntentsPerWriteIntentError.Get(&r.store.cfg.Settings.SV)
	intents, needIntentHistory, err := storage.ScanConflictingIntents(

Discussed in person: for point reads, we want this to use a prefix iterator so that it can hit bloom filters.

pkg/kv/kvserver/replica_read.go line 431 at r1 (raw file):

		// pass in a latch guard here that indicates as much. In other words, what
		// we want is to ensure that requests that have acquired-and-then-dropped
		// latches at ts X cannot retry above ts X.

This relates to the comment about setting g to nil if we drop latches early.

pkg/kv/kvserver/replica_tscache.go line 59 at r1 (raw file):

// expected to be called when a request is dropping its latches before it
// is evaluated.
func (r *Replica) updateTimestampCacheBeforeEval(

Why do we need this? Isn't it equivalent to calling updateTimestampCache with a nil br and pErr?

pkg/storage/mvcc.go line 910 at r1 (raw file):

	if rangeKeyMasking && opts.KeyTypes != IterKeyTypePointsOnly &&
		opts.RangeKeyMaskingBelow.IsEmpty() {
		opts.RangeKeyMaskingBelow = timestamp

Does this need to be lifted above the previous branch?

pkg/storage/mvcc.go line 1290 at r1 (raw file):

	blind := ms == nil && timestamp.IsEmpty()
	if !blind {
		iter = newMVCCIterator(rw, timestamp, false /* rangeKeyMasking */, IterOptions{

Why did some of these lines change? Was that a goland refactor thing?

sumeerbhola

Reviewed 6 of 14 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)

pkg/kv/kvserver/replica_evaluate.go line 628 at r1 (raw file):

		return false
	}
	if ba.Header.ReadConsistency != roachpb.CONSISTENT {

It would be helpful to have a comment stating why !CONSISTENT is not eligible.

pkg/kv/kvserver/replica_evaluate.go line 631 at r1 (raw file):

		return false
	}
	if g == nil {

A code comment would help here, since its not clear what the g==nil case represents.

pkg/kv/kvserver/replica_evaluate.go line 656 at r1 (raw file):

	}
	// TODO(aayush): Consider whether there are other requests that can be let
	// through.

A high-level comment would help here. IIUC, everything below is about ensuring the request does not acquire locks.
Otherwise it is hard to see why ExportRequest is always ok.

pkg/storage/engine.go line 429 at r1 (raw file):

	// scans the MVCC keyspace and does not interleave intents into the results.
	// This is only relevant for MVCCIterators.
	DontInterleaveIntents bool

nit:
why do we have this in IterOptions given that we have an explicit parameter in NewMVCCIterator(iterKind MVCCIterKind, opts IterOptions) MVCCIterator?
IterOptions shouldn't be used for mvcc.go plumbing since that is a higher layer than the code in engine.go. It should be used for what is needed for the interfaces defined in engine.go?

pkg/storage/engine.go line 1183 at r1 (raw file):

// be able to read the correct provisional value.
//
// Note that this method does not take interleaved intents into account at all.

I don't think we need this "Note that ..." comment since there are no interleaved intents.
Unfortunately, we still have some stale commentary and code in intentInterleavingIter that allows for it, but we shouldn't propagate that to new code.
We've also not always been clear in distinguishing physically interleaved intents from a storage perspective (which we no longer have) and logically interleaving intents (which we of course continue to have) -- I assume this comment was referring to the former.

pkg/storage/engine.go line 1191 at r1 (raw file):

	start, end roachpb.Key,
	maxIntents, targetBytes int64,
) (intents []roachpb.Intent, needIntentHistory bool, err error) {

so if this scan comes back with no intents but says needIntentHistory=true we will need to scan the lock table twice (since we will need to interleave), yes? Worth benchmarking that case to see how much slower it is, especially for short scans or gets. I wonder whether we will need a heuristic to guess that needIntentHistory will be false.

aayushshah15

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)

pkg/kv/kvserver/replica_evaluate.go line 625 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: do we need to check IsReadOnly? We're only calling this from the read-only path.

Updated the method name to make that clear.

pkg/kv/kvserver/replica_evaluate.go line 625 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: move ba.IsLocking() this down right above the WaitPolicy check.

Done.

pkg/kv/kvserver/replica_evaluate.go line 628 at r1 (raw file):

Previously, sumeerbhola wrote…

It would be helpful to have a comment stating why !CONSISTENT is not eligible.

PTAL. I'd like to have these requests still drop their latches early while also having them skip the initial lock table scan. Let me know if anything seems incorrect in that TODO.

pkg/kv/kvserver/replica_evaluate.go line 631 at r1 (raw file):

Previously, sumeerbhola wrote…

A code comment would help here, since its not clear what the g==nil case represents.

Done.

pkg/kv/kvserver/replica_evaluate.go line 643 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: here and below, panic on unexpected enum values instead of failing silently.

Done.

pkg/kv/kvserver/replica_evaluate.go line 656 at r1 (raw file):

Previously, sumeerbhola wrote…

A high-level comment would help here. IIUC, everything below is about ensuring the request does not acquire locks.
Otherwise it is hard to see why ExportRequest is always ok.

How do you feel about the overall commentary here now?

pkg/kv/kvserver/replica_evaluate.go line 662 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We shouldn't need this now that we're checking IsLocking().

Done.

pkg/kv/kvserver/replica_evaluate.go line 666 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same comment, this shouldn't be needed now.

Done.

pkg/kv/kvserver/replica_read.go line 83 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We can delete this now 🥳

Reworded.

pkg/kv/kvserver/replica_read.go line 167 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can this be if g != nil. That would mean that latches are still held.

Done.

pkg/kv/kvserver/replica_read.go line 231 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think the assignment to g is doing the right thing anymore, now that this is in a helper function. You'll need to lift that back up into executeReadOnlyBatch after each call to this function.

Lifted.

pkg/kv/kvserver/replica_read.go line 266 at r1 (raw file):

What is the intent history in the context of a read-only request?

You're right, done.

pkg/kv/kvserver/replica_read.go line 266 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: s/canDropLatches/ok/, because this is already the name of the method.

Done.

pkg/kv/kvserver/replica_read.go line 275 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Discussed in person: this isn't quite right. We don't want to scan across the entire boundary span, just the spans of the individual requests.

Done.

pkg/kv/kvserver/replica_read.go line 278 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Discussed in person: for point reads, we want this to use a prefix iterator so that it can hit bloom filters.

Done.

pkg/kv/kvserver/replica_read.go line 431 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This relates to the comment about setting g to nil if we drop latches early.

Done.

pkg/kv/kvserver/replica_tscache.go line 59 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we need this? Isn't it equivalent to calling updateTimestampCache with a nil br and pErr?

Done.

pkg/storage/engine.go line 429 at r1 (raw file):

Previously, sumeerbhola wrote…

nit:
why do we have this in IterOptions given that we have an explicit parameter in NewMVCCIterator(iterKind MVCCIterKind, opts IterOptions) MVCCIterator?
IterOptions shouldn't be used for mvcc.go plumbing since that is a higher layer than the code in engine.go. It should be used for what is needed for the interfaces defined in engine.go?

You're right.

pkg/storage/engine.go line 1183 at r1 (raw file):

Previously, sumeerbhola wrote…

I don't think we need this "Note that ..." comment since there are no interleaved intents.
Unfortunately, we still have some stale commentary and code in intentInterleavingIter that allows for it, but we shouldn't propagate that to new code.
We've also not always been clear in distinguishing physically interleaved intents from a storage perspective (which we no longer have) and logically interleaving intents (which we of course continue to have) -- I assume this comment was referring to the former.

Removed.

pkg/storage/engine.go line 1191 at r1 (raw file):

I wonder whether we will need a heuristic to guess that needIntentHistory will be false.

I don't fully follow this part, did you just mean to say that in some cases, we should be able to have a heuristic to say "this batch will definitely need access to the intent history, so don't bother scanning the lock table upfront"?

I'll post some benchmark results here shortly.

pkg/storage/mvcc.go line 910 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this need to be lifted above the previous branch?

Discussed with Erik offline, let me know if anything seems off.

pkg/storage/mvcc.go line 1290 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why did some of these lines change? Was that a goland refactor thing?

Yep, it was. I've changed the signature of newMVCCIterator again to address Sumeer's comment though.

sumeerbhola

btw, I'm just doing drive-by reviewing of selective pieces -- don't block on me for merging

Reviewed 4 of 11 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)

pkg/kv/kvserver/replica_evaluate.go line 656 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

How do you feel about the overall commentary here now?

looks good

pkg/kv/kvserver/replica_evaluate.go line 699 at r5 (raw file):

// using an intent interleaving iterator in order to be able to read the correct
// provisional value.
func scanConflictingIntents(

nit: we usually place such storage reading code in the storage package. It could go with the various helper functions defined at the bottom of engine.go (for example ScanIntents).

pkg/storage/engine.go line 1191 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

I wonder whether we will need a heuristic to guess that needIntentHistory will be false.

I don't fully follow this part, did you just mean to say that in some cases, we should be able to have a heuristic to say "this batch will definitely need access to the intent history, so don't bother scanning the lock table upfront"?

I'll post some benchmark results here shortly.

Ack.

aayushshah15 · 2022-08-24T03:05:17Z

I ran the ycsb/[ABCDEF]/nodes=3/cpu=32 roachtests with and without this patch. The results seem promising!

new is with this patch and old is without

YCSB-A

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32   30.0k ± 7%   30.5k ± 8%    ~     (p=0.280 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    2.34 ±10%    2.26 ± 6%    ~     (p=0.242 n=10+10)

name            old p95      new p95      delta
nodes=3/cpu=32    8.51 ±10%    7.85 ± 3%  -7.76%  (p=0.001 n=10+8)

name            old p99      new p99      delta
nodes=3/cpu=32    62.9 ±13%    62.7 ±14%    ~     (p=0.990 n=9+9)

YCSB-B

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32   91.4k ± 4%   94.7k ± 3%   +3.59%  (p=0.007 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    1.40 ± 0%    1.40 ± 0%     ~     (all equal)

name            old p95      new p95      delta
nodes=3/cpu=32    5.83 ± 6%    5.62 ± 3%   -3.60%  (p=0.019 n=10+10)

name            old p99      new p99      delta
nodes=3/cpu=32    15.7 ±20%    12.8 ± 7%  -18.63%  (p=0.000 n=10+9)

YCSB-C

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32    123k ± 6%    131k ± 6%   +6.48%  (p=0.001 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    1.40 ± 0%    1.26 ± 5%  -10.00%  (p=0.000 n=9+10)

name            old p95      new p95      delta
nodes=3/cpu=32    3.42 ± 4%    3.26 ± 7%   -4.82%  (p=0.019 n=8+10)

name            old p99      new p99      delta
nodes=3/cpu=32    6.04 ±13%    5.52 ± 9%   -8.61%  (p=0.005 n=10+10)

YCSB-D

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32    101k ± 5%    106k ± 3%  +5.69%  (p=0.000 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    1.20 ± 0%    1.10 ± 0%  -8.33%  (p=0.000 n=10+8)

name            old p95      new p95      delta
nodes=3/cpu=32    3.51 ± 5%    3.36 ± 4%  -4.27%  (p=0.005 n=10+10)

name            old p99      new p99      delta
nodes=3/cpu=32    6.01 ± 5%    5.69 ± 5%  -5.32%  (p=0.013 n=10+10)

YCSB-E

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32   3.39k ± 3%   3.20k ±13%   ~     (p=0.089 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    36.5 ± 3%    39.0 ±13%   ~     (p=0.124 n=10+10)

name            old p95      new p95      delta
nodes=3/cpu=32     110 ± 3%     121 ±18%   ~     (p=0.067 n=10+10)

name            old p99      new p99      delta
nodes=3/cpu=32     151 ± 0%     162 ±19%   ~     (p=0.387 n=9+10)

YCSB-F

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32   15.4k ± 5%   15.9k ± 7%    ~     (p=0.095 n=9+10)

name            old p50      new p50      delta
nodes=3/cpu=32    5.59 ± 4%    5.53 ± 6%    ~     (p=0.720 n=10+10)

name            old p95      new p95      delta
nodes=3/cpu=32    17.4 ±10%    16.1 ± 6%  -7.00%  (p=0.008 n=10+9)

name            old p99      new p99      delta
nodes=3/cpu=32     158 ±22%     151 ±14%    ~     (p=0.577 n=10+10)

nvanbenschoten

Those results look promising! Thanks for running them. It would also be worthwhile to run the kv0 and kv95 roachtests on 8 and 32 vCPU machines.

Reviewed 4 of 14 files at r1, 8 of 11 files at r2, 2 of 3 files at r6, 2 of 2 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @sumeerbhola)

pkg/kv/kvserver/replica_evaluate.go line 699 at r5 (raw file):

Previously, sumeerbhola wrote…

nit: we usually place such storage reading code in the storage package. It could go with the various helper functions defined at the bottom of engine.go (for example ScanIntents).

+1

pkg/kv/kvserver/replica_evaluate.go line 646 at r6 (raw file):

	if len(g.Req.LockSpans.GetSpans(spanset.SpanReadOnly, spanset.SpanLocal)) != 0 {
		// If the request declared any local read spans, it must hold latches
		// throughout its execution.

Why is this?

pkg/kv/kvserver/replica_evaluate.go line 708 at r6 (raw file):

	maxIntents int64,
) (needIntentHistory bool, err error) {
	upperBoundUnset := bytes.Equal(end, roachpb.KeyMin) // NB: Get requests do not set the end key.

nit: this is usually expressed as len(end) == 0.

pkg/kv/kvserver/replica_evaluate.go line 714 at r6 (raw file):

	ltStart, _ := keys.LockTableSingleKey(start, nil)
	ltEnd, _ := keys.LockTableSingleKey(end, nil)

Can we avoid allocating this key if upperBoundUnset?

pkg/kv/kvserver/replica_evaluate.go line 727 at r6 (raw file):

	var ok bool
	for ok, err = iter.SeekEngineKeyGE(storage.EngineKey{Key: ltStart}); ok; ok, err = iter.NextEngineKey() {
		if err := ctx.Err(); err != nil {

Do we need to check the context on each key? This often incurs a mutex acquisition (depending on the ctx), so it could be expensive if there are a lot of keys in the lock table.

pkg/kv/kvserver/replica_evaluate.go line 733 at r6 (raw file):

			break
		}
		key, err := iter.EngineKey()

This allocates and copies the key. In the next step, we decode the key. Should we save both until after we're sure it's a conflicting intent that we care about?

pkg/kv/kvserver/replica_read.go line 108 at r7 (raw file):

			evalPath = readOnlyWithoutInterleavedIntents
		}
		g = r.updateTimestampCacheAndDropLatches(ctx, g, ba, nil /* br */, nil /* pErr */, st)

If updateTimestampCacheAndDropLatches always returns nil then it's probably not worth the confusion. Just have the next line be g = nil. We should do that down below with the other call to updateTimestampCacheAndDropLatches as well.

pkg/kv/kvserver/replica_read.go line 233 at r7 (raw file):

	ec := endCmds{repl: r, g: g, st: st}
	ec.done(ctx, ba, br, pErr)
	return nil

See my comment above.

pkg/kv/kvserver/replica_read.go line 236 at r7 (raw file):

}

var allowDroppingLatchesBeforeEval = settings.RegisterBoolSetting(

Can we use this instead of the new testing knob in tests? But actually, is anyone using that testing knob?

pkg/kv/kvserver/replica_read.go line 288 at r7 (raw file):

	maxIntents := storage.MaxIntentsPerWriteIntentError.Get(&r.store.cfg.Settings.SV)
	var start, end roachpb.Key

nit: do these need such a large scope? Can we just do start, end := ... and needsIntentInterleavingForThisRequest, err := ... in the loop?

pkg/kv/kvserver/replica_send.go line 795 at r1 (raw file):

	// latchSpans, because we have already released our latches and plan to
	// re-acquire them if the retry is allowed.
	if !canDoServersideRetry(ctx, pErr, ba, nil /* br */, nil /* g */, hlc.Timestamp{} /* deadline */) {

nit: I don't think we want to lose this.

pkg/kv/kvserver/replica_send.go line 487 at r7 (raw file):

		dropLatchesAndLockWaitQueues := func() {
			if g != nil {
				latchSpans, lockSpans = g.TakeSpanSets()

We don't want to do this in the ReadWithinUncertaintyIntervalError case, so we may want this function to accept a reuseLatchAndLockSpans arg.

pkg/kv/kvserver/store_test.go line 1910 at r6 (raw file):

	// This test expects different updates to the timestamp cache based on whether
	// a given request was evaluated under `PessimisticEval` or `OptimisticEval`.
	// If these requests hit any concurrency retry errors, they are typically

If we stay on the optimistic eval path after hitting a conflict, don't we risk seeing the same conflicting lock over and over again? Are there other ways that we could avoid the flakiness without introducing this knob? For instance, using DisableDroppingLatchesBeforeEval.

pkg/storage/mvcc.go line 871 at r6 (raw file):

	LockTable LockTableView
	// DontInterleavedIntents, when set, makes it such that intent metadata is not
	// interleaved with the results of the scan.

Could you add a little more about when it's appropriate to use this and when it is not?

sumeerbhola

Isn't ycsb-c read-only? I am confused by the improvement there, since I was expecting to see an improvement only for workloads with read-write contention. What am I missing?

Do we think ycsb-b and ycsb-d, which have only 5% writes, are better because:

the zipfian/latest distributions result in contention
the read latch release reduces contention for waiting writes, which helps reads that arrive after the write (since we have fair FIFO latch behavior, that can cause read-read contention due to intervening writes).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @sumeerbhola)

aayushshah15

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)

pkg/kv/kvserver/replica_send.go line 487 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

That works, but it fails to Release the spansets back into their sync.Pool. You get that for free if you don't TakeSpanSets() before FinishReq.

oh, that makes sense. Done.

pkg/kv/kvserver/replica_send.go line 525 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this the only case where latches might have already been dropped?

Yes I think so

pkg/kv/kvserver/batcheval/cmd_export.go line 94 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why did we need this? Do the spanset assertion work when evaluating without latches?

It's because we specifically assert on accesses to the GC threshold key here:

cockroach/pkg/kv/kvserver/replica_eval_context_span.go

Lines 160 to 162 in cd0fb93

    
           rec.ss.AssertAllowed(spanset.SpanReadOnly, 
        
           	roachpb.Span{Key: keys.RangeGCThresholdKey(rec.GetRangeID())}, 
        
           )

.

Do you feel like we should address this differently?

aayushshah15 · 2022-10-10T04:13:38Z

I re-ran YCSB-E (for 50 iterations) and YCSB-C (for 20 iterations) to try and clarify the results from #85993 (comment).

benchstat before after
name                   old ops/sec  new ops/sec  delta
ycsb/E/nodes=3/cpu=32   3.48k ± 5%   3.41k ± 4%  -2.19%  (p=0.000 n=45+50)

name                   old p50      new p50      delta
ycsb/E/nodes=3/cpu=32    35.7 ± 0%    36.3 ± 4%  +1.57%  (p=0.000 n=44+50)

name                   old p95      new p95      delta
ycsb/E/nodes=3/cpu=32     108 ± 5%     110 ± 5%  +1.78%  (p=0.000 n=45+50)

name                   old p99      new p99      delta
ycsb/E/nodes=3/cpu=32     149 ± 7%     151 ± 0%  +1.03%  (p=0.048 n=49+36)

benchstat before after
name                   old ops/sec  new ops/sec  delta
ycsb/C/nodes=3/cpu=32    132k ± 2%    134k ± 4%    ~     (p=0.141 n=17+20)

name                   old p50      new p50      delta
ycsb/C/nodes=3/cpu=32    1.30 ± 0%    1.20 ± 0%  -7.69%  (p=0.000 n=16+18)

name                   old p95      new p95      delta
ycsb/C/nodes=3/cpu=32    3.30 ± 0%    3.23 ± 5%    ~     (p=0.302 n=12+20)

name                   old p99      new p99      delta
ycsb/C/nodes=3/cpu=32    5.50 ± 0%    5.49 ± 9%    ~     (p=0.594 n=15+20)

Do these numbers make more sense @sumeerbhola? The reasons you've outlined for improvements in YCSB B and D match my understanding.

sumeerbhola · 2022-10-20T19:10:10Z

No change in ycsb/C does make sense. Thanks for collecting the numbers.

nvanbenschoten

Reviewed 4 of 4 files at r10, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @sumeerbhola)

nvanbenschoten · 2022-10-31T14:22:53Z

But let's still wait until v22.2 is out the door (in a week or so) before merging onto master.

nvanbenschoten · 2022-11-07T18:47:43Z

Test failures on master will no longer be considered release blockers, so we can go ahead and merge this. Nice job @aayushshah15!

…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485

Release note: None

aayushshah15 · 2022-11-17T16:53:07Z

TFTRs

bors r+

craig · 2022-11-17T16:53:10Z

👎 Rejected by code reviews

aayushshah15 · 2022-11-17T17:35:17Z

bors r+

craig · 2022-11-17T18:03:52Z

Build failed:

Bazel Essential CI (Cockroach)

aayushshah15 · 2022-11-17T18:29:00Z

bors r+

craig · 2022-11-17T19:54:59Z

Build succeeded:

Bazel Essential CI (Cockroach)

aayushshah15 requested review from a team as code owners August 11, 2022 20:26

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from 7db18d8 to e46d282 Compare August 11, 2022 20:36

nvanbenschoten reviewed Aug 12, 2022

View reviewed changes

sumeerbhola requested changes Aug 15, 2022

View reviewed changes

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch 9 times, most recently from 2a0b1dd to 103ea9f Compare August 20, 2022 01:15

aayushshah15 commented Aug 22, 2022

View reviewed changes

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch 5 times, most recently from f895281 to 67a6c25 Compare August 22, 2022 19:53

sumeerbhola reviewed Aug 23, 2022

View reviewed changes

This comment was marked as outdated.

Sign in to view

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch 2 times, most recently from 07d28db to a1ba3c6 Compare August 23, 2022 21:25

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from a1ba3c6 to 6dce8f5 Compare August 24, 2022 03:41

nvanbenschoten reviewed Aug 24, 2022

View reviewed changes

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from 6dce8f5 to 8f3d795 Compare August 24, 2022 16:13

sumeerbhola reviewed Sep 6, 2022

View reviewed changes

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from 9e8560f to 64c1375 Compare October 4, 2022 19:14

aayushshah15 commented Oct 4, 2022

View reviewed changes

nvanbenschoten approved these changes Oct 31, 2022

View reviewed changes

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch 8 times, most recently from e276557 to 8d08be5 Compare November 8, 2022 20:51

aayushshah15 added 2 commits November 17, 2022 11:52

kvserver: add metrics for when read-only batches drop latches

52a0e00

Release note: None

aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from 8d08be5 to 52a0e00 Compare November 17, 2022 16:52

aayushshah15 requested a review from sumeerbhola November 17, 2022 16:53

sumeerbhola approved these changes Nov 17, 2022

View reviewed changes

craig bot merged commit 3830d63 into cockroachdb:master Nov 17, 2022

cockroach-teamcity mentioned this pull request Nov 17, 2022

PR #85993 - kvserver: allow certain read-only requests to drop latches before evaluation cockroachdb/docs#15642

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: allow certain read-only requests to drop latches before evaluation #85993

kvserver: allow certain read-only requests to drop latches before evaluation #85993

aayushshah15 commented Aug 11, 2022 •

edited

Loading

cockroach-teamcity commented Aug 11, 2022

aayushshah15 commented Aug 11, 2022

nvanbenschoten left a comment

sumeerbhola left a comment

aayushshah15 left a comment

sumeerbhola left a comment

This comment was marked as outdated.

aayushshah15 commented Aug 24, 2022

nvanbenschoten left a comment

sumeerbhola left a comment

aayushshah15 left a comment

aayushshah15 commented Oct 10, 2022

sumeerbhola commented Oct 20, 2022

nvanbenschoten left a comment

nvanbenschoten commented Oct 31, 2022

nvanbenschoten commented Nov 7, 2022

aayushshah15 commented Nov 17, 2022

craig bot commented Nov 17, 2022

aayushshah15 commented Nov 17, 2022

craig bot commented Nov 17, 2022

aayushshah15 commented Nov 17, 2022

craig bot commented Nov 17, 2022

	rec.ss.AssertAllowed(spanset.SpanReadOnly,
	roachpb.Span{Key: keys.RangeGCThresholdKey(rec.GetRangeID())},
	)

kvserver: allow certain read-only requests to drop latches before evaluation #85993

kvserver: allow certain read-only requests to drop latches before evaluation #85993

Conversation

aayushshah15 commented Aug 11, 2022 • edited Loading

cockroach-teamcity commented Aug 11, 2022

aayushshah15 commented Aug 11, 2022

nvanbenschoten left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

aayushshah15 left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

aayushshah15 commented Aug 24, 2022

YCSB-A

YCSB-B

YCSB-C

YCSB-D

YCSB-E

YCSB-F

nvanbenschoten left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

aayushshah15 left a comment

Choose a reason for hiding this comment

aayushshah15 commented Oct 10, 2022

sumeerbhola commented Oct 20, 2022

nvanbenschoten left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Oct 31, 2022

nvanbenschoten commented Nov 7, 2022

aayushshah15 commented Nov 17, 2022

craig bot commented Nov 17, 2022

aayushshah15 commented Nov 17, 2022

craig bot commented Nov 17, 2022

aayushshah15 commented Nov 17, 2022

craig bot commented Nov 17, 2022

aayushshah15 commented Aug 11, 2022 •

edited

Loading