Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: allow certain read-only requests to drop latches before evaluation #85993

Merged

Conversation

aayushshah15
Copy link
Contributor

@aayushshah15 aayushshah15 commented Aug 11, 2022

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since #76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the IntentHistory for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an intentInterleavingIterator. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
IntentHistory (which lives in the lock-table keyspace), and doesn't need to
use an intentInterleavingIterator.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves #66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap

@aayushshah15 aayushshah15 requested review from a team as code owners August 11, 2022 20:26
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15
Copy link
Contributor Author

cc @nvanbenschoten and @arulajmani

I'm working through the testing and the test failures, but this should still benefit from a high-level pass before that's done.

@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from 7db18d8 to e46d282 Compare August 11, 2022 20:36
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 14 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)


pkg/kv/kvserver/replica_evaluate.go line 625 at r1 (raw file):

	// conflicting requests during their execution. Thus, they cannot release
	// their latches before evaluation.
	if ba.IsLocking() || !ba.IsReadOnly() {

nit: do we need to check IsReadOnly? We're only calling this from the read-only path.


pkg/kv/kvserver/replica_evaluate.go line 625 at r1 (raw file):

	// conflicting requests during their execution. Thus, they cannot release
	// their latches before evaluation.
	if ba.IsLocking() || !ba.IsReadOnly() {

nit: move ba.IsLocking() this down right above the WaitPolicy check.


pkg/kv/kvserver/replica_evaluate.go line 643 at r1 (raw file):

		return false
	default:
		return false

nit: here and below, panic on unexpected enum values instead of failing silently.


pkg/kv/kvserver/replica_evaluate.go line 662 at r1 (raw file):

		case *roachpb.ExportRequest:
		case *roachpb.GetRequest:
			if inner.(*roachpb.GetRequest).KeyLocking != lock.None {

We shouldn't need this now that we're checking IsLocking().


pkg/kv/kvserver/replica_evaluate.go line 666 at r1 (raw file):

			}
		case *roachpb.ScanRequest:
			if inner.(*roachpb.ScanRequest).KeyLocking != lock.None {

Same comment, this shouldn't be needed now.


pkg/kv/kvserver/replica_read.go line 83 at r1 (raw file):

		return nil, g, nil, roachpb.NewError(err)
	}
	// TODO(nvanbenschoten): once all replicated intents are pulled into the

We can delete this now 🥳


pkg/kv/kvserver/replica_read.go line 167 at r1 (raw file):

		pErr = r.handleReadOnlyLocalEvalResult(ctx, ba, result.Local)
	}
	if !canDropLatches {

Can this be if g != nil. That would mean that latches are still held.


pkg/kv/kvserver/replica_read.go line 231 at r1 (raw file):

) {
	var ec endCmds
	ec, g = endCmds{repl: r, g: g, st: st}, nil

I don't think the assignment to g is doing the right thing anymore, now that this is in a helper function. You'll need to lift that back up into executeReadOnlyBatch after each call to this function.


pkg/kv/kvserver/replica_read.go line 266 at r1 (raw file):

needIntentHistory

This is a confusing condition, especially because we're returning true when canDropLatches is false. What is the intent history in the context of a read-only request? Why do we need it for non-txn requests?

Consider changing the condition to either needsIntentInterleaving or stillNeedsIntentInterleaving (depending on whether you want to return true or false when !canDropLatches.


pkg/kv/kvserver/replica_read.go line 266 at r1 (raw file):

	g *concurrency.Guard,
	st kvserverpb.LeaseStatus,
) (canDropLatches, needIntentHistory bool, pErr *roachpb.Error) {

nit: s/canDropLatches/ok/, because this is already the name of the method.


pkg/kv/kvserver/replica_read.go line 275 at r1 (raw file):

		ctx, 3, "can drop latches early for batch (%v); scanning lock table first to detect conflicts", ba,
	)
	maximalLatchSpan := g.LatchSpans().BoundarySpan(spanset.SpanGlobal)

Discussed in person: this isn't quite right. We don't want to scan across the entire boundary span, just the spans of the individual requests.


pkg/kv/kvserver/replica_read.go line 278 at r1 (raw file):

	start, end := maximalLatchSpan.Key, maximalLatchSpan.EndKey
	maxIntents := storage.MaxIntentsPerWriteIntentError.Get(&r.store.cfg.Settings.SV)
	intents, needIntentHistory, err := storage.ScanConflictingIntents(

Discussed in person: for point reads, we want this to use a prefix iterator so that it can hit bloom filters.


pkg/kv/kvserver/replica_read.go line 431 at r1 (raw file):

		// pass in a latch guard here that indicates as much. In other words, what
		// we want is to ensure that requests that have acquired-and-then-dropped
		// latches at ts X cannot retry above ts X.

This relates to the comment about setting g to nil if we drop latches early.


pkg/kv/kvserver/replica_tscache.go line 59 at r1 (raw file):

// expected to be called when a request is dropping its latches before it
// is evaluated.
func (r *Replica) updateTimestampCacheBeforeEval(

Why do we need this? Isn't it equivalent to calling updateTimestampCache with a nil br and pErr?


pkg/storage/mvcc.go line 910 at r1 (raw file):

	if rangeKeyMasking && opts.KeyTypes != IterKeyTypePointsOnly &&
		opts.RangeKeyMaskingBelow.IsEmpty() {
		opts.RangeKeyMaskingBelow = timestamp

Does this need to be lifted above the previous branch?


pkg/storage/mvcc.go line 1290 at r1 (raw file):

	blind := ms == nil && timestamp.IsEmpty()
	if !blind {
		iter = newMVCCIterator(rw, timestamp, false /* rangeKeyMasking */, IterOptions{

Why did some of these lines change? Was that a goland refactor thing?

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 14 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)


pkg/kv/kvserver/replica_evaluate.go line 628 at r1 (raw file):

		return false
	}
	if ba.Header.ReadConsistency != roachpb.CONSISTENT {

It would be helpful to have a comment stating why !CONSISTENT is not eligible.


pkg/kv/kvserver/replica_evaluate.go line 631 at r1 (raw file):

		return false
	}
	if g == nil {

A code comment would help here, since its not clear what the g==nil case represents.


pkg/kv/kvserver/replica_evaluate.go line 656 at r1 (raw file):

	}
	// TODO(aayush): Consider whether there are other requests that can be let
	// through.

A high-level comment would help here. IIUC, everything below is about ensuring the request does not acquire locks.
Otherwise it is hard to see why ExportRequest is always ok.


pkg/storage/engine.go line 429 at r1 (raw file):

	// scans the MVCC keyspace and does not interleave intents into the results.
	// This is only relevant for MVCCIterators.
	DontInterleaveIntents bool

nit:
why do we have this in IterOptions given that we have an explicit parameter in NewMVCCIterator(iterKind MVCCIterKind, opts IterOptions) MVCCIterator?
IterOptions shouldn't be used for mvcc.go plumbing since that is a higher layer than the code in engine.go. It should be used for what is needed for the interfaces defined in engine.go?


pkg/storage/engine.go line 1183 at r1 (raw file):

// be able to read the correct provisional value.
//
// Note that this method does not take interleaved intents into account at all.

I don't think we need this "Note that ..." comment since there are no interleaved intents.
Unfortunately, we still have some stale commentary and code in intentInterleavingIter that allows for it, but we shouldn't propagate that to new code.
We've also not always been clear in distinguishing physically interleaved intents from a storage perspective (which we no longer have) and logically interleaving intents (which we of course continue to have) -- I assume this comment was referring to the former.


pkg/storage/engine.go line 1191 at r1 (raw file):

	start, end roachpb.Key,
	maxIntents, targetBytes int64,
) (intents []roachpb.Intent, needIntentHistory bool, err error) {

so if this scan comes back with no intents but says needIntentHistory=true we will need to scan the lock table twice (since we will need to interleave), yes? Worth benchmarking that case to see how much slower it is, especially for short scans or gets. I wonder whether we will need a heuristic to guess that needIntentHistory will be false.

@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch 9 times, most recently from 2a0b1dd to 103ea9f Compare August 20, 2022 01:15
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/replica_evaluate.go line 625 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: do we need to check IsReadOnly? We're only calling this from the read-only path.

Updated the method name to make that clear.


pkg/kv/kvserver/replica_evaluate.go line 625 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: move ba.IsLocking() this down right above the WaitPolicy check.

Done.


pkg/kv/kvserver/replica_evaluate.go line 628 at r1 (raw file):

Previously, sumeerbhola wrote…

It would be helpful to have a comment stating why !CONSISTENT is not eligible.

PTAL. I'd like to have these requests still drop their latches early while also having them skip the initial lock table scan. Let me know if anything seems incorrect in that TODO.


pkg/kv/kvserver/replica_evaluate.go line 631 at r1 (raw file):

Previously, sumeerbhola wrote…

A code comment would help here, since its not clear what the g==nil case represents.

Done.


pkg/kv/kvserver/replica_evaluate.go line 643 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: here and below, panic on unexpected enum values instead of failing silently.

Done.


pkg/kv/kvserver/replica_evaluate.go line 656 at r1 (raw file):

Previously, sumeerbhola wrote…

A high-level comment would help here. IIUC, everything below is about ensuring the request does not acquire locks.
Otherwise it is hard to see why ExportRequest is always ok.

How do you feel about the overall commentary here now?


pkg/kv/kvserver/replica_evaluate.go line 662 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We shouldn't need this now that we're checking IsLocking().

Done.


pkg/kv/kvserver/replica_evaluate.go line 666 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same comment, this shouldn't be needed now.

Done.


pkg/kv/kvserver/replica_read.go line 83 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We can delete this now 🥳

Reworded.


pkg/kv/kvserver/replica_read.go line 167 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can this be if g != nil. That would mean that latches are still held.

Done.


pkg/kv/kvserver/replica_read.go line 231 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think the assignment to g is doing the right thing anymore, now that this is in a helper function. You'll need to lift that back up into executeReadOnlyBatch after each call to this function.

Lifted.


pkg/kv/kvserver/replica_read.go line 266 at r1 (raw file):

What is the intent history in the context of a read-only request?

You're right, done.


pkg/kv/kvserver/replica_read.go line 266 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: s/canDropLatches/ok/, because this is already the name of the method.

Done.


pkg/kv/kvserver/replica_read.go line 275 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Discussed in person: this isn't quite right. We don't want to scan across the entire boundary span, just the spans of the individual requests.

Done.


pkg/kv/kvserver/replica_read.go line 278 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Discussed in person: for point reads, we want this to use a prefix iterator so that it can hit bloom filters.

Done.


pkg/kv/kvserver/replica_read.go line 431 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This relates to the comment about setting g to nil if we drop latches early.

Done.


pkg/kv/kvserver/replica_tscache.go line 59 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we need this? Isn't it equivalent to calling updateTimestampCache with a nil br and pErr?

Done.


pkg/storage/engine.go line 429 at r1 (raw file):

Previously, sumeerbhola wrote…

nit:
why do we have this in IterOptions given that we have an explicit parameter in NewMVCCIterator(iterKind MVCCIterKind, opts IterOptions) MVCCIterator?
IterOptions shouldn't be used for mvcc.go plumbing since that is a higher layer than the code in engine.go. It should be used for what is needed for the interfaces defined in engine.go?

You're right.


pkg/storage/engine.go line 1183 at r1 (raw file):

Previously, sumeerbhola wrote…

I don't think we need this "Note that ..." comment since there are no interleaved intents.
Unfortunately, we still have some stale commentary and code in intentInterleavingIter that allows for it, but we shouldn't propagate that to new code.
We've also not always been clear in distinguishing physically interleaved intents from a storage perspective (which we no longer have) and logically interleaving intents (which we of course continue to have) -- I assume this comment was referring to the former.

Removed.


pkg/storage/engine.go line 1191 at r1 (raw file):

I wonder whether we will need a heuristic to guess that needIntentHistory will be false.

I don't fully follow this part, did you just mean to say that in some cases, we should be able to have a heuristic to say "this batch will definitely need access to the intent history, so don't bother scanning the lock table upfront"?

I'll post some benchmark results here shortly.


pkg/storage/mvcc.go line 910 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this need to be lifted above the previous branch?

Discussed with Erik offline, let me know if anything seems off.


pkg/storage/mvcc.go line 1290 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why did some of these lines change? Was that a goland refactor thing?

Yep, it was. I've changed the signature of newMVCCIterator again to address Sumeer's comment though.

@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch 5 times, most recently from f895281 to 67a6c25 Compare August 22, 2022 19:53
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I'm just doing drive-by reviewing of selective pieces -- don't block on me for merging

Reviewed 4 of 11 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)


pkg/kv/kvserver/replica_evaluate.go line 656 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

How do you feel about the overall commentary here now?

looks good


pkg/kv/kvserver/replica_evaluate.go line 699 at r5 (raw file):

// using an intent interleaving iterator in order to be able to read the correct
// provisional value.
func scanConflictingIntents(

nit: we usually place such storage reading code in the storage package. It could go with the various helper functions defined at the bottom of engine.go (for example ScanIntents).


pkg/storage/engine.go line 1191 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

I wonder whether we will need a heuristic to guess that needIntentHistory will be false.

I don't fully follow this part, did you just mean to say that in some cases, we should be able to have a heuristic to say "this batch will definitely need access to the intent history, so don't bother scanning the lock table upfront"?

I'll post some benchmark results here shortly.

Ack.

@aayushshah15

This comment was marked as outdated.

@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch 2 times, most recently from 07d28db to a1ba3c6 Compare August 23, 2022 21:25
@aayushshah15
Copy link
Contributor Author

I ran the ycsb/[ABCDEF]/nodes=3/cpu=32 roachtests with and without this patch. The results seem promising!

new is with this patch and old is without

YCSB-A

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32   30.0k ± 7%   30.5k ± 8%    ~     (p=0.280 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    2.34 ±10%    2.26 ± 6%    ~     (p=0.242 n=10+10)

name            old p95      new p95      delta
nodes=3/cpu=32    8.51 ±10%    7.85 ± 3%  -7.76%  (p=0.001 n=10+8)

name            old p99      new p99      delta
nodes=3/cpu=32    62.9 ±13%    62.7 ±14%    ~     (p=0.990 n=9+9)

YCSB-B

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32   91.4k ± 4%   94.7k ± 3%   +3.59%  (p=0.007 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    1.40 ± 0%    1.40 ± 0%     ~     (all equal)

name            old p95      new p95      delta
nodes=3/cpu=32    5.83 ± 6%    5.62 ± 3%   -3.60%  (p=0.019 n=10+10)

name            old p99      new p99      delta
nodes=3/cpu=32    15.7 ±20%    12.8 ± 7%  -18.63%  (p=0.000 n=10+9)

YCSB-C

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32    123k ± 6%    131k ± 6%   +6.48%  (p=0.001 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    1.40 ± 0%    1.26 ± 5%  -10.00%  (p=0.000 n=9+10)

name            old p95      new p95      delta
nodes=3/cpu=32    3.42 ± 4%    3.26 ± 7%   -4.82%  (p=0.019 n=8+10)

name            old p99      new p99      delta
nodes=3/cpu=32    6.04 ±13%    5.52 ± 9%   -8.61%  (p=0.005 n=10+10)

YCSB-D

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32    101k ± 5%    106k ± 3%  +5.69%  (p=0.000 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    1.20 ± 0%    1.10 ± 0%  -8.33%  (p=0.000 n=10+8)

name            old p95      new p95      delta
nodes=3/cpu=32    3.51 ± 5%    3.36 ± 4%  -4.27%  (p=0.005 n=10+10)

name            old p99      new p99      delta
nodes=3/cpu=32    6.01 ± 5%    5.69 ± 5%  -5.32%  (p=0.013 n=10+10)

YCSB-E

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32   3.39k ± 3%   3.20k ±13%   ~     (p=0.089 n=10+10)

name            old p50      new p50      delta
nodes=3/cpu=32    36.5 ± 3%    39.0 ±13%   ~     (p=0.124 n=10+10)

name            old p95      new p95      delta
nodes=3/cpu=32     110 ± 3%     121 ±18%   ~     (p=0.067 n=10+10)

name            old p99      new p99      delta
nodes=3/cpu=32     151 ± 0%     162 ±19%   ~     (p=0.387 n=9+10)

YCSB-F

name            old ops/sec  new ops/sec  delta
nodes=3/cpu=32   15.4k ± 5%   15.9k ± 7%    ~     (p=0.095 n=9+10)

name            old p50      new p50      delta
nodes=3/cpu=32    5.59 ± 4%    5.53 ± 6%    ~     (p=0.720 n=10+10)

name            old p95      new p95      delta
nodes=3/cpu=32    17.4 ±10%    16.1 ± 6%  -7.00%  (p=0.008 n=10+9)

name            old p99      new p99      delta
nodes=3/cpu=32     158 ±22%     151 ±14%    ~     (p=0.577 n=10+10)

@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from a1ba3c6 to 6dce8f5 Compare August 24, 2022 03:41
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those results look promising! Thanks for running them. It would also be worthwhile to run the kv0 and kv95 roachtests on 8 and 32 vCPU machines.

Reviewed 4 of 14 files at r1, 8 of 11 files at r2, 2 of 3 files at r6, 2 of 2 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @sumeerbhola)


pkg/kv/kvserver/replica_evaluate.go line 699 at r5 (raw file):

Previously, sumeerbhola wrote…

nit: we usually place such storage reading code in the storage package. It could go with the various helper functions defined at the bottom of engine.go (for example ScanIntents).

+1


pkg/kv/kvserver/replica_evaluate.go line 646 at r6 (raw file):

	if len(g.Req.LockSpans.GetSpans(spanset.SpanReadOnly, spanset.SpanLocal)) != 0 {
		// If the request declared any local read spans, it must hold latches
		// throughout its execution.

Why is this?


pkg/kv/kvserver/replica_evaluate.go line 708 at r6 (raw file):

	maxIntents int64,
) (needIntentHistory bool, err error) {
	upperBoundUnset := bytes.Equal(end, roachpb.KeyMin) // NB: Get requests do not set the end key.

nit: this is usually expressed as len(end) == 0.


pkg/kv/kvserver/replica_evaluate.go line 714 at r6 (raw file):

	ltStart, _ := keys.LockTableSingleKey(start, nil)
	ltEnd, _ := keys.LockTableSingleKey(end, nil)

Can we avoid allocating this key if upperBoundUnset?


pkg/kv/kvserver/replica_evaluate.go line 727 at r6 (raw file):

	var ok bool
	for ok, err = iter.SeekEngineKeyGE(storage.EngineKey{Key: ltStart}); ok; ok, err = iter.NextEngineKey() {
		if err := ctx.Err(); err != nil {

Do we need to check the context on each key? This often incurs a mutex acquisition (depending on the ctx), so it could be expensive if there are a lot of keys in the lock table.


pkg/kv/kvserver/replica_evaluate.go line 733 at r6 (raw file):

			break
		}
		key, err := iter.EngineKey()

This allocates and copies the key. In the next step, we decode the key. Should we save both until after we're sure it's a conflicting intent that we care about?


pkg/kv/kvserver/replica_read.go line 108 at r7 (raw file):

			evalPath = readOnlyWithoutInterleavedIntents
		}
		g = r.updateTimestampCacheAndDropLatches(ctx, g, ba, nil /* br */, nil /* pErr */, st)

If updateTimestampCacheAndDropLatches always returns nil then it's probably not worth the confusion. Just have the next line be g = nil. We should do that down below with the other call to updateTimestampCacheAndDropLatches as well.


pkg/kv/kvserver/replica_read.go line 233 at r7 (raw file):

	ec := endCmds{repl: r, g: g, st: st}
	ec.done(ctx, ba, br, pErr)
	return nil

See my comment above.


pkg/kv/kvserver/replica_read.go line 236 at r7 (raw file):

}

var allowDroppingLatchesBeforeEval = settings.RegisterBoolSetting(

Can we use this instead of the new testing knob in tests? But actually, is anyone using that testing knob?


pkg/kv/kvserver/replica_read.go line 288 at r7 (raw file):

	maxIntents := storage.MaxIntentsPerWriteIntentError.Get(&r.store.cfg.Settings.SV)
	var start, end roachpb.Key

nit: do these need such a large scope? Can we just do start, end := ... and needsIntentInterleavingForThisRequest, err := ... in the loop?


pkg/kv/kvserver/replica_send.go line 795 at r1 (raw file):

	// latchSpans, because we have already released our latches and plan to
	// re-acquire them if the retry is allowed.
	if !canDoServersideRetry(ctx, pErr, ba, nil /* br */, nil /* g */, hlc.Timestamp{} /* deadline */) {

nit: I don't think we want to lose this.


pkg/kv/kvserver/replica_send.go line 487 at r7 (raw file):

		dropLatchesAndLockWaitQueues := func() {
			if g != nil {
				latchSpans, lockSpans = g.TakeSpanSets()

We don't want to do this in the ReadWithinUncertaintyIntervalError case, so we may want this function to accept a reuseLatchAndLockSpans arg.


pkg/kv/kvserver/store_test.go line 1910 at r6 (raw file):

	// This test expects different updates to the timestamp cache based on whether
	// a given request was evaluated under `PessimisticEval` or `OptimisticEval`.
	// If these requests hit any concurrency retry errors, they are typically

If we stay on the optimistic eval path after hitting a conflict, don't we risk seeing the same conflicting lock over and over again? Are there other ways that we could avoid the flakiness without introducing this knob? For instance, using DisableDroppingLatchesBeforeEval.


pkg/storage/mvcc.go line 871 at r6 (raw file):

	LockTable LockTableView
	// DontInterleavedIntents, when set, makes it such that intent metadata is not
	// interleaved with the results of the scan.

Could you add a little more about when it's appropriate to use this and when it is not?

@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from 6dce8f5 to 8f3d795 Compare August 24, 2022 16:13
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't ycsb-c read-only? I am confused by the improvement there, since I was expecting to see an improvement only for workloads with read-write contention. What am I missing?

Do we think ycsb-b and ycsb-d, which have only 5% writes, are better because:

  • the zipfian/latest distributions result in contention
  • the read latch release reduces contention for waiting writes, which helps reads that arrive after the write (since we have fair FIFO latch behavior, that can cause read-read contention due to intervening writes).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @sumeerbhola)

@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from 9e8560f to 64c1375 Compare October 4, 2022 19:14
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)


pkg/kv/kvserver/replica_send.go line 487 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

That works, but it fails to Release the spansets back into their sync.Pool. You get that for free if you don't TakeSpanSets() before FinishReq.

oh, that makes sense. Done.


pkg/kv/kvserver/replica_send.go line 525 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this the only case where latches might have already been dropped?

Yes I think so


pkg/kv/kvserver/batcheval/cmd_export.go line 94 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why did we need this? Do the spanset assertion work when evaluating without latches?

It's because we specifically assert on accesses to the GC threshold key here:

rec.ss.AssertAllowed(spanset.SpanReadOnly,
roachpb.Span{Key: keys.RangeGCThresholdKey(rec.GetRangeID())},
)
.

Do you feel like we should address this differently?

@aayushshah15
Copy link
Contributor Author

I re-ran YCSB-E (for 50 iterations) and YCSB-C (for 20 iterations) to try and clarify the results from #85993 (comment).

benchstat before after
name                   old ops/sec  new ops/sec  delta
ycsb/E/nodes=3/cpu=32   3.48k ± 5%   3.41k ± 4%  -2.19%  (p=0.000 n=45+50)

name                   old p50      new p50      delta
ycsb/E/nodes=3/cpu=32    35.7 ± 0%    36.3 ± 4%  +1.57%  (p=0.000 n=44+50)

name                   old p95      new p95      delta
ycsb/E/nodes=3/cpu=32     108 ± 5%     110 ± 5%  +1.78%  (p=0.000 n=45+50)

name                   old p99      new p99      delta
ycsb/E/nodes=3/cpu=32     149 ± 7%     151 ± 0%  +1.03%  (p=0.048 n=49+36)

benchstat before after
name                   old ops/sec  new ops/sec  delta
ycsb/C/nodes=3/cpu=32    132k ± 2%    134k ± 4%    ~     (p=0.141 n=17+20)

name                   old p50      new p50      delta
ycsb/C/nodes=3/cpu=32    1.30 ± 0%    1.20 ± 0%  -7.69%  (p=0.000 n=16+18)

name                   old p95      new p95      delta
ycsb/C/nodes=3/cpu=32    3.30 ± 0%    3.23 ± 5%    ~     (p=0.302 n=12+20)

name                   old p99      new p99      delta
ycsb/C/nodes=3/cpu=32    5.50 ± 0%    5.49 ± 9%    ~     (p=0.594 n=15+20)

Do these numbers make more sense @sumeerbhola? The reasons you've outlined for improvements in YCSB B and D match my understanding.

@sumeerbhola
Copy link
Collaborator

No change in ycsb/C does make sense. Thanks for collecting the numbers.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r10, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @sumeerbhola)

@nvanbenschoten
Copy link
Member

But let's still wait until v22.2 is out the door (in a week or so) before merging onto master.

@nvanbenschoten
Copy link
Member

Test failures on master will no longer be considered release blockers, so we can go ahead and merge this. Nice job @aayushshah15!

@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch 8 times, most recently from e276557 to 8d08be5 Compare November 8, 2022 20:51
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485
@aayushshah15 aayushshah15 force-pushed the releaseLatchesEarlyForPessimistic branch from 8d08be5 to 52a0e00 Compare November 17, 2022 16:52
@aayushshah15
Copy link
Contributor Author

TFTRs

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 17, 2022

👎 Rejected by code reviews

@aayushshah15
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 17, 2022

Build failed:

@aayushshah15
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 17, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: release latches before evaluation for read-only requests
4 participants