storage,kv: faster intent resolution over a key range #66268

sumeerbhola · 2021-06-09T15:27:24Z

When all the intents on a range are known to be separated,
as indicated by MCCStats, and a ResolveIntentRangeRequest
needs to be processed (for a transaction that did enough
writes that we did not track individual intent keys), we
avoid iterating over MVCC key-value pairs to find those
intents, and instead iterate over the separated lock table
key space.

A new iterForKeyVersions interface is introduced which
provides a narrow iterator interface to be used by
mvccResolveWriteIntent that resolves individual intents.
This interface is only for iterating over a single key,
and is implemented by the usual MVCCIterator and by the
new separatedIntentAndVersionIter. The latter serves
both as a way to find intents over a range and for
individual intent resolution, in an optimized manner.

There are new benchmarks which vary the sparseness
(sparseness of N means that 1 in every N keys has an
intent that is for the txn for which intent resolution
is being performed) and whether the remaining N-1 keys
have an intent from a different transaction
(other-txn-intents=true) or not (the former results
in the cost of at least parsing the MVCCMetadata to
compare txn IDs).

Full results are below. Two things to note are:

The percent-flushed={0,50} sometimes show a regression.
These are atypical cases where the SingleDelete to
remove the intent has not been flushed so the intent
Set and SingleDelete are both alive. This would be
a pathalogical case when the LSM is unhealthy. Even
then the regression is < 170%.
The percent-flushed=100 cases are the typical one.
Once sparseness increases to 100 or 1000 we see speed
gains close to 100x, as indicated by the following:

IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16        235µs ± 2%      58µs ± 3%   -75.52%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16      154µs ± 2%       2µs ± 4%   -98.72%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16     137µs ± 1%       1µs ± 2%   -99.07%  (p=0.008 n=5+5)

Full results:

name                                                                                                             old time/op    new time/op    delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16          450µs ± 5%     234µs ± 2%   -48.07%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16         294µs ± 3%     101µs ± 4%   -65.49%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16        235µs ± 2%      58µs ± 3%   -75.52%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16        494µs ± 7%     218µs ± 6%   -55.84%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16       251µs ± 2%      57µs ± 3%   -77.29%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16      154µs ± 2%       2µs ± 4%   -98.72%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16         343µs ± 5%     236µs ± 2%   -31.28%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16        240µs ± 1%      74µs ± 2%   -69.02%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16       203µs ± 1%      30µs ± 1%   -84.97%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16       516µs ± 4%     274µs ±13%   -46.91%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16      256µs ± 2%      57µs ± 1%   -77.67%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16     137µs ± 1%       1µs ± 2%   -99.07%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16        325µs ± 2%     237µs ± 1%   -26.98%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16       232µs ± 4%      72µs ± 1%   -68.89%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16      199µs ± 1%      30µs ± 0%   -84.87%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16        2.08ms ± 1%    2.68ms ± 8%   +28.57%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16        592µs ± 4%     596µs ± 7%      ~     (p=1.000 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16       242µs ± 2%      59µs ± 3%   -75.73%  (p=0.016 n=4+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16      3.04ms ± 5%    2.60ms ± 3%   -14.59%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16      840µs ± 4%     539µs ± 3%   -35.87%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16     163µs ± 6%       2µs ± 6%   -98.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16        936µs ±38%    2497µs ±12%  +166.83%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16       281µs ±12%     551µs ± 2%   +95.64%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16      209µs ±13%      31µs ± 9%   -84.95%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16     3.04ms ± 6%    2.55ms ± 0%      ~     (p=0.333 n=5+1)

name                                                                                                             old alloc/op   new alloc/op   delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16         12.8kB ± 0%    13.0kB ± 0%    +0.92%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16        12.8kB ± 0%    13.0kB ± 0%    +0.94%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16       12.9kB ± 0%    13.0kB ± 0%    +0.59%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16       1.75kB ± 0%    0.29kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16      1.75kB ± 0%    0.29kB ± 0%   -83.18%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16     1.75kB ± 0%    0.29kB ± 0%   -83.22%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16        12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16       12.8kB ± 0%    11.4kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16      12.8kB ± 0%    11.4kB ± 0%   -11.41%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16      1.66kB ± 0%    0.17kB ± 0%   -89.61%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16     1.66kB ± 0%    0.17kB ± 0%   -89.61%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16    1.66kB ± 0%    0.17kB ± 0%   -89.63%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16       12.8kB ± 0%    11.4kB ± 0%   -11.50%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16      12.8kB ± 0%    11.4kB ± 0%      ~     (p=0.079 n=4+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16     12.8kB ± 0%    11.4kB ± 0%   -11.53%  (p=0.029 n=4+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16        12.8kB ± 0%    13.0kB ± 0%    +0.95%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16       12.8kB ± 0%    13.0kB ± 0%    +0.95%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16      12.9kB ± 0%    13.0kB ± 0%    +0.54%  (p=0.016 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16      1.75kB ± 0%    0.29kB ± 0%   -83.14%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16     1.75kB ± 0%    0.29kB ± 0%   -83.19%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16    1.75kB ± 0%    0.29kB ± 0%   -83.22%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16       12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16      12.8kB ± 0%    11.4kB ± 0%   -11.37%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16     12.8kB ± 0%    11.4kB ± 0%   -11.41%  (p=0.000 n=5+4)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16     1.66kB ± 0%    0.17kB ± 0%   -89.56%  (p=0.000 n=5+1)

name                                                                                                             old allocs/op  new allocs/op  delta
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16            401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16           401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16          401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16          104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16         104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16        104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16           401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16          401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16         401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16         102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16        102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16       102 ± 0%         3 ± 0%   -97.06%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16          401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16         401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16        401 ± 0%       303 ± 0%   -24.44%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16           401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16          401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16         401 ± 0%       404 ± 0%    +0.75%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16         104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16        104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16       104 ± 0%         8 ± 0%   -92.31%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16          401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16         401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16        401 ± 0%       305 ± 0%   -23.94%  (p=0.008 n=5+5)
IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16        102 ± 0%         3 ± 0%   -97.06%  (p=0.000 n=5+1)

Release note (performance improvement): Intent resolution for transactions
that write many intents such that we track intent ranges, for the purpose
of intent resolution, is much faster (potentially 100x) when using the
separated lock table.

Release justification: Fix for high-severity poor performance in existing
functionality, and is a low risk, high benefit change, that potentially
improves ranged intent resolution speed by 100x.

cockroach-teamcity · 2021-06-09T15:27:30Z

This change is

sumeerbhola

There are no tests yet -- I'll add after initial opinions.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):

	onlySeparatedIntents := false
	stats := cArgs.EvalCtx.GetMVCCStats()
	if stats.ContainsEstimates == 0 && stats.IntentCount == stats.SeparatedIntentCount {

This question is whether we should trust this or not
@tbg @nvanbenschoten

One way to strengthen this would be to additionally gate this on the long -running migration being written by @itsbilal having finished running across the whole cluster. The plan is for that to run soon after the cluster version is finalized at 21.2. It is a correctness requirement for that to not leave any interleaved intents, since we will rip out the code that looks for interleaved intents in 22.1.

tbg

Glad to see this. Clearly we want this, the only question in my mind is whether we can afford to have it wait until later (i.e. one release after the migration), but I don't think so since there isn't much (any, really?) complexity here that would be avoided later. I heavily lean towards "just" using the cluster version to gate this functionality, unless there are arguments for being more aggressive. It's just the boring, safe, option.

I haven't given this a detailed review yet, have focused on the specific question of gating the mechanism, and want to understand the comment below:

The percent-flushed={0,50} sometimes show a regression.
These are atypical cases where the SingleDelete to
remove the intent has not been flushed so the intent
Set and SingleDelete are both alive. This would be
a pathalogical case when the LSM is unhealthy. Even
then the regression is < 100%.

For my education, can you explain what scenario this is? In the benchmark you set up intents, so there aren't any single deletes in the dataset on top of which you repeatedly open the batches. MVCCResolveWriteIntentRange is the thing writing the SingleDeletes and once it's deleted something it has handled that intent and won't return in the same batch, right? I'm missing how you even care about where the SingleDeletes go. Also, and this is unsurprising since I'm already lost at this point, I don't understand why this would be the case in practice in pathological LSMs.
Also, there is a +166% regression in the data you posted.

Reviewed 8 of 8 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)

pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):
There's reason to be hesitant about using the stats for something that could affect correctness or availability. For a while we had our fair share of stats bugs where the numbers just weren't right. This is a few years back, and our testing has improved, but I don't know that future changes to the stats couldn't have introduced a silent, so far benign, problem. That said, I don't know of anything concrete here. This should just work. We could gather more evidence from CockroachCloud - if the stats were "wrong" in the sense that they don't agree with recomputation (but all of the replica have the same stats), the consistency checker issues a delta that patches them up. I think SREs could look for this log line

cockroach/pkg/kv/kvserver/replica_consistency.go

Lines 250 to 259 in 83fbe72

// We've found that there's something to correct; send an RecomputeStatsRequest. Note that this

// code runs only on the lease holder (at the time of initiating the computation), so this work

// isn't duplicated except in rare leaseholder change scenarios (and concurrent invocation of

// RecomputeStats is allowed because these requests block on one another). Also, we're

// essentially paced by the consistency checker so we won't call this too often.

log.Infof(ctx, "triggering stats recomputation to resolve delta of %+v", results[0].Response.Delta)

req := roachpb.RecomputeStatsRequest{

RequestHeader: roachpb.RequestHeader{Key: startKey},

}

which they shouldn't generally see outside of a few expected cases (we introduced AbortSpanBytes, etc); none of which touch the IntentCount field.

Still, there hasn't been much attention paid to the correctness of stats and so I wouldn't want to start relying on it if we don't have to. The cluster version seems like the obvious choice. Using stats would require additional motivation - such as avoiding a performance drop prior to the migration completing. I'd hope that we've gotten the perf overhead "close enough" to vanilla 21.1 to avoid issues here.

As an aside, if the version is active, but IntentCount is not equal to SeparatedIntentCount, something is amiss, so we should catch that at least in our internal testing.

sumeerbhola

The percent-flushed={0,50} sometimes show a regression.
These are atypical cases where the SingleDelete to
remove the intent has not been flushed so the intent
Set and SingleDelete are both alive. This would be
a pathalogical case when the LSM is unhealthy. Even
then the regression is < 100%.

For my education, can you explain what scenario this is? In the benchmark you set up intents, so there aren't any single deletes in the dataset on top of which you repeatedly open the batches. MVCCResolveWriteIntentRange is the thing writing the SingleDeletes and once it's deleted something it has handled that intent and won't return in the same batch, right?

The SingleDeletes are indeed happening during intent resolution, but not for the part that the benchmark is timing, where we just discard the batch and don't measure the time to commit it (since we want to be able to repeatedly resolve the same intents). The setupKeysWithIntent is writing each version with an intent and resolving all the intents except some of the latest versions that the benchmark iterations will work with. That is where the SingleDeletes are getting written to Pebble.

I'm missing how you even care about where the SingleDeletes go. Also, and this is unsurprising since I'm already lost at this point, I don't understand why this would be the case in practice in pathological LSMs.

The bad case is when all the SingleDeletes, Sets still exist in the LSM, since with the separated locks, which have the txnID in the key suffix, the latest live intent (i.e., without the SingleDelete) is probabilistically somewhere in the middle of all these keys which share the same roachpb.Key. The iteration code (neither the one on master, or the optimized path here) can avoid iterating over them inside pebble.Iterator, which is wasteful. The unoptimized path only needs to iterate over 1/2 of them, the ones preceding the still live intent, after which it seeks using nextKey. The optimized path does a NextEngineKey (since seeks generally are more expensive) and also encounters the deleted ones that are immediately after the live intent (hence the 166% regression). I have never seen this in practice in our tpcc or kv roachtests, but it occurs often in our unit tests since we have so little data that everything is still sitting in the memtable. setupKeysWithIntent explicitly compensates for this by flushing (based on the numFlushedVersions parameter). Another case where it can occur is if the LSM became unhealthy -- say lots of bytes accumulated in L0 (and the Set and SingleDelete were in different flushes to L0, hence in different L0 sstables).

Also, there is a +166% regression in the data you posted.

oops, I misread. Fixed.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @tbg)

tbg

Thank you for the explanation! I understand this now, though trying to corroborate it in setupKeysWithIntent was challenging. That method has a ton of params and logic and perhaps some of your prose above can be turned into documentation for it. I have a feeling that this is the kind of method that might both be touched by someone else in the future and be challenging for them to understand the original motivations for.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @tbg)

nvanbenschoten

Reviewed 7 of 8 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)

pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):

Clearly we want this, the only question in my mind is whether we can afford to have it wait until later (i.e. one release after the migration), but I don't think so since there isn't much (any, really?) complexity here that would be avoided later. I heavily lean towards "just" using the cluster version to gate this functionality, unless there are arguments for being more aggressive.

I think it's clearer to you two than it is to me what the alternatives for a migration we have are. When we say "use the cluster version" here, do you mean wait until 22.1 to ensure that the long-running migration added in 21.2 will already be complete? Or do you mean wait until that long-running migration has completed in 22.1 and then switch over? The second option is what I would have expected, and I think it allows us to avoid any dependency on the correctness of MVCCStats.

Is there a downside to waiting on this long-running migration to complete? The only one I can think of is that we can't start using this fast-path on some ranges until all ranges have been migrated over. But is that much of a concern?

pkg/storage/mvcc.go, line 2788 at r2 (raw file):

// mvccResolveWriteIntent. The MVCCIterator is positioned lazily, only if
// needed -- the fast path for intent resolution when a transaction is
// committing does not need to position the MVCCIterator.

Is the iterator positioned if the transaction is committing and needs to change its commit timestamp from the provisional values existing timestamp? What about when a transaction is aborted? I believe the answer is "yes" and "yes", but it would be worth confirming and adding to this comment.

pkg/storage/mvcc.go, line 2884 at r2 (raw file):

}

func mvccGetIntent(

Could we give this a comment?

pkg/storage/mvcc.go, line 3154 at r2 (raw file):

	var unsafeNextKey MVCCKey
	var unsafeNextValue []byte
	if nextKey.IsValue() {

Is this just checking that lastKey's timestamp wasn't 0,1? If so, it may be worth a comment that this is the common case.

pkg/storage/mvcc.go, line 3309 at r2 (raw file):

	var putBuf *putBuffer
	// Exactly one of sepIter and mvccIter is non-nil.

Some commentary on which cases each is used would be nice.

pkg/storage/mvcc.go, line 3394 at r2 (raw file):

		var ok bool
		if !key.IsValue() {
			// When we start allowing existence of multiple intents on the same key,

This was optimistic.

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)

pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):

Or do you mean wait until that long-running migration has completed in 22.1 and then switch over? The second option is what I would have expected, and I think it allows us to avoid any dependency on the correctness of MVCCStats.

I assume you meant 21.2. Yes, that was the option I was thinking of, and I believe that was what Tobi was saying too. Waiting for 22.1 will not really save us much code -- the changes here are mostly for the optimized path, and there are only small tweaks to the unoptimized legacy path, and the latter is what we would delete in 22.1.

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)

pkg/kv/kvserver/batcheval/cmd_resolve_intent_range.go, line 49 at r1 (raw file):

Previously, sumeerbhola wrote…

Or do you mean wait until that long-running migration has completed in 22.1 and then switch over? The second option is what I would have expected, and I think it allows us to avoid any dependency on the correctness of MVCCStats.

I assume you meant 21.2. Yes, that was the option I was thinking of, and I believe that was what Tobi was saying too. Waiting for 22.1 will not really save us much code -- the changes here are mostly for the optimized path, and there are only small tweaks to the unoptimized legacy path, and the latter is what we would delete in 22.1.

Yes, if 21.1 is active, there shouldn't (mustn't) be any more inline intents, so we can always take this path. I think that's the canonical choice here and looks like we all agree.

sumeerbhola

TFTRs!
I'll hold off on this PR until the separated intents migration is merged since I need to gate this fast path on that migration being complete.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @tbg)

pkg/storage/mvcc.go, line 2788 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is the iterator positioned if the transaction is committing and needs to change its commit timestamp from the provisional values existing timestamp? What about when a transaction is aborted? I believe the answer is "yes" and "yes", but it would be worth confirming and adding to this comment.

Correct. Done.

pkg/storage/mvcc.go, line 2884 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could we give this a comment?

Done

pkg/storage/mvcc.go, line 3154 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this just checking that lastKey's timestamp wasn't 0,1? If so, it may be worth a comment that this is the common case.

Done

pkg/storage/mvcc.go, line 3309 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Some commentary on which cases each is used would be nice.

Done

pkg/storage/mvcc.go, line 3394 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This was optimistic.

:)
Removed since it isn't relevant. The separated intents case already looks at each intent.

sumeerbhola · 2021-08-18T19:59:09Z

I've updated this with tests, and to use the post migration cluster version. PTAL

sumeerbhola · 2021-08-26T13:29:05Z

@nvanbenschoten @itsbilal gentle ping on this -- I'd like to merge this early in the stability period.

itsbilal

code looks great to me! minor comments mostly just about comments

Reviewed 13 of 14 files at r4, 3 of 3 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)

pkg/kv/kvserver/replica_write.go, line 757 at r5 (raw file):

		// The lock spans are insufficient for ranged intent resolution, which
		// does not declare lock spans and directly calls
		// spanSetBatch.NewEngineIterator. TODO(sumeer): we can't keep adding

TODO should be on a new line.

pkg/kv/kvserver/store.go, line 237 at r5 (raw file):

		// This is before PostSeparatedIntentsMigration, so we may have
		// interleaved intents.
		version = clusterversion.ByKey(clusterversion.SeparatedIntentsMigration)

Why not do version = clusterversion.ByKey(clusterversion.SeparatedIntentsMigration - 1) so it's clear the migration hasn't run? Makes little difference though, but just makes the purpose clearer. I can see a cluster version at clusterversion.SeparatedIntentsMigration being seen as "all live replicas have been migrated" while in reality we could still be writing interleaved intents with the knob in tests (a criticism that applies more with the other instance of this line below).

pkg/kv/kvserver/batcheval/cmd_resolve_intent_test.go, line 277 at r5 (raw file):

		// Before PostSeparatedIntentsMigration, so intent resolution will
		// not assume only separated intents.
		version = clusterversion.ByKey(clusterversion.SeparatedIntentsMigration)

Same

pkg/storage/mvcc.go, line 3324 at r5 (raw file):

	var putBuf *putBuffer
	// Exactly one of sepIter and mvccIter is non-nil. sepIter is used when
	// onlySeparatedIntents=true, and mvccIter when false. The former allows for

sepIter is used when onlySeparatedIntents=true and rw provides consistent iterators right? maybe the latter is not important if only pebble snapshots/batches will be passed in, and never an engine.

pkg/storage/mvcc.go, line 3336 at r5 (raw file):

	// ConsistentIterators() since we want the two iterators to be mutually
	// consistent (and production code will have consistent iterators).
	// TODO(sumeer): when removing the slow path, use

Nit: empty line before TODO

nvanbenschoten

Reviewed 13 of 14 files at r4, 3 of 3 files at r5.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @sumeerbhola)

pkg/kv/kvserver/batcheval/cmd_end_transaction.go, line 463 at r5 (raw file):

	// Some tests have st == nil.
	if st != nil {
		onlySeparatedIntents = !st.Version.ActiveVersionOrEmpty(ctx).Less(

Are you intentionally avoiding IsActive? I have the same question elsewhere.

pkg/kv/kvserver/batcheval/cmd_resolve_intent_test.go, line 228 at r5 (raw file):

			rir.EndKey = endKey

			declareKeysResolveIntentRange(&desc, h, &rir, &spans, nil)

I know this isn't all new, but why are we declaring keys in this test? Is that needed?

pkg/storage/mvcc.go, line 3403 at r5 (raw file):

			}
		} else {
			mvccIter.SeekGE(nextKey)

It's a little strange to me as a reader where the two iterators are seeked. sepIter is seeked before the loop and then at the end of each iteration. mvccIter is seeked at the beginning of each iteration. Would it be worth reconciling this difference?

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal and @nvanbenschoten)

pkg/kv/kvserver/replica_write.go, line 757 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

TODO should be on a new line.

Done

pkg/kv/kvserver/store.go, line 237 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Why not do version = clusterversion.ByKey(clusterversion.SeparatedIntentsMigration - 1) so it's clear the migration hasn't run? Makes little difference though, but just makes the purpose clearer. I can see a cluster version at clusterversion.SeparatedIntentsMigration being seen as "all live replicas have been migrated" while in reality we could still be writing interleaved intents with the knob in tests (a criticism that applies more with the other instance of this line below).

Done

pkg/kv/kvserver/batcheval/cmd_end_transaction.go, line 463 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are you intentionally avoiding IsActive? I have the same question elsewhere.

Silliness. Fixed here and elsewhere.

pkg/kv/kvserver/batcheval/cmd_resolve_intent_test.go, line 228 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I know this isn't all new, but why are we declaring keys in this test? Is that needed?

They are being used by spanSetBatch that we created earlier. We are just initializing the spans later, but before doing reads and writes. I've added a comment.

pkg/kv/kvserver/batcheval/cmd_resolve_intent_test.go, line 277 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Same

Done

pkg/storage/mvcc.go, line 3324 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

sepIter is used when onlySeparatedIntents=true and rw provides consistent iterators right? maybe the latter is not important if only pebble snapshots/batches will be passed in, and never an engine.

Updated the comment. We almost never use an engine for production paths. But tests often do.

pkg/storage/mvcc.go, line 3336 at r5 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Nit: empty line before TODO

Done

pkg/storage/mvcc.go, line 3403 at r5 (raw file):

sepIter is seeked before the loop and then at the end of each iteration. mvccIter is seeked at the beginning of each iteration. Would it be worth reconciling this difference?

I've added some code comments to explain the difference. The main reason is that we step the sepIter and only seek it once, so we do the seek before the loop starts. The calls to sepIter.nextEngineKey need to happen at the end of an iteration to prepare for the next iteration. This is important for performance. While mvccIter is always being seeked, so we can consistently do it at the beginning of the iteration.

sumeerbhola · 2021-08-31T18:46:24Z

I can't reproduce the TestCCLLogic failure locally, so possibly flaky test.
The Example_demo failure is reproducible and looks to be a test setup inconsistency in that we randomize the separated intents setting for the sticky engine, which can be inconsistent with that the KV layer thinks. I'm going to change the sticky engine to always run with separated intents.

sumeerbhola · 2021-09-01T14:34:52Z

logictestccl is timing out, which happens on my machine even when I disable this optimization.

I've added a randomized ranged intent resolution test to compare the results using the fast and slow paths.
It turns out there is a bug where if the txn is aborting, but there are partial rollbacks in the ignored seq nums, we do the latter and don't fully resolve the intent. This in the single intent resolution code, and unrelated to this PR. This is harmless since the intent resolution will eventually get retried. I am not going to fix it in this PR.

When all the intents on a range are known to be separated, as indicated by MCCStats, and a ResolveIntentRangeRequest needs to be processed (for a transaction that did enough writes that we did not track individual intent keys), we avoid iterating over MVCC key-value pairs to find those intents, and instead iterate over the separated lock table key space. A new iterForKeyVersions interface is introduced which provides a narrow iterator interface to be used by mvccResolveWriteIntent that resolves individual intents. This interface is only for iterating over a single key, and is implemented by the usual MVCCIterator and by the new separatedIntentAndVersionIter. The latter serves both as a way to find intents over a range and for individual intent resolution, in an optimized manner. There are new benchmarks which vary the sparseness (sparseness of N means that 1 of every N key has an intent that is for the txn for which intent resolution is being performed) and whether the remaining N-1 keys have an intent from a different transaction (other-txn-intents=true) or not (the former results in the cost of at least parsing the MVCCMetadata to compare txn IDs). Full results are below. Two things to note are: - The percent-flushed={0,50} sometimes show a regression. These are atypical cases where the SingleDelete to remove the intent has not been flushed so the intent Set and SingleDelete are both alive. This would be a pathalogical case when the LSM is unhealthy. Even then the regression is < 170%. - The percent-flushed=100 cases are the typical one. Once sparseness increases to 100 or 1000 we see speed gains close to 100x, as indicated by the following: ``` IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16 235µs ± 2% 58µs ± 3% -75.52% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16 154µs ± 2% 2µs ± 4% -98.72% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16 137µs ± 1% 1µs ± 2% -99.07% (p=0.008 n=5+5) ``` Full results: ``` name old time/op new time/op delta IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16 450µs ± 5% 234µs ± 2% -48.07% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16 294µs ± 3% 101µs ± 4% -65.49% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16 235µs ± 2% 58µs ± 3% -75.52% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16 494µs ± 7% 218µs ± 6% -55.84% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16 251µs ± 2% 57µs ± 3% -77.29% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16 154µs ± 2% 2µs ± 4% -98.72% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16 343µs ± 5% 236µs ± 2% -31.28% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16 240µs ± 1% 74µs ± 2% -69.02% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16 203µs ± 1% 30µs ± 1% -84.97% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16 516µs ± 4% 274µs ±13% -46.91% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16 256µs ± 2% 57µs ± 1% -77.67% (p=0.016 n=4+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16 137µs ± 1% 1µs ± 2% -99.07% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16 325µs ± 2% 237µs ± 1% -26.98% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16 232µs ± 4% 72µs ± 1% -68.89% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16 199µs ± 1% 30µs ± 0% -84.87% (p=0.016 n=5+4) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16 2.08ms ± 1% 2.68ms ± 8% +28.57% (p=0.016 n=4+5) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16 592µs ± 4% 596µs ± 7% ~ (p=1.000 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16 242µs ± 2% 59µs ± 3% -75.73% (p=0.016 n=4+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16 3.04ms ± 5% 2.60ms ± 3% -14.59% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16 840µs ± 4% 539µs ± 3% -35.87% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16 163µs ± 6% 2µs ± 6% -98.75% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16 936µs ±38% 2497µs ±12% +166.83% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16 281µs ±12% 551µs ± 2% +95.64% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16 209µs ±13% 31µs ± 9% -84.95% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16 3.04ms ± 6% 2.55ms ± 0% ~ (p=0.333 n=5+1) name old alloc/op new alloc/op delta IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16 12.8kB ± 0% 13.0kB ± 0% +0.92% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16 12.8kB ± 0% 13.0kB ± 0% +0.94% (p=0.016 n=5+4) IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16 12.9kB ± 0% 13.0kB ± 0% +0.59% (p=0.016 n=5+4) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16 1.75kB ± 0% 0.29kB ± 0% ~ (p=0.079 n=4+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16 1.75kB ± 0% 0.29kB ± 0% -83.18% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16 1.75kB ± 0% 0.29kB ± 0% -83.22% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16 12.8kB ± 0% 11.4kB ± 0% -11.37% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16 12.8kB ± 0% 11.4kB ± 0% ~ (p=0.079 n=4+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16 12.8kB ± 0% 11.4kB ± 0% -11.41% (p=0.000 n=5+4) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16 1.66kB ± 0% 0.17kB ± 0% -89.61% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16 1.66kB ± 0% 0.17kB ± 0% -89.61% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16 1.66kB ± 0% 0.17kB ± 0% -89.63% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16 12.8kB ± 0% 11.4kB ± 0% -11.50% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16 12.8kB ± 0% 11.4kB ± 0% ~ (p=0.079 n=4+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16 12.8kB ± 0% 11.4kB ± 0% -11.53% (p=0.029 n=4+4) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16 12.8kB ± 0% 13.0kB ± 0% +0.95% (p=0.016 n=5+4) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16 12.8kB ± 0% 13.0kB ± 0% +0.95% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16 12.9kB ± 0% 13.0kB ± 0% +0.54% (p=0.016 n=5+4) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16 1.75kB ± 0% 0.29kB ± 0% -83.14% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16 1.75kB ± 0% 0.29kB ± 0% -83.19% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16 1.75kB ± 0% 0.29kB ± 0% -83.22% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16 12.8kB ± 0% 11.4kB ± 0% -11.37% (p=0.000 n=5+4) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16 12.8kB ± 0% 11.4kB ± 0% -11.37% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16 12.8kB ± 0% 11.4kB ± 0% -11.41% (p=0.000 n=5+4) IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16 1.66kB ± 0% 0.17kB ± 0% -89.56% (p=0.000 n=5+1) name old allocs/op new allocs/op delta IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=0-16 401 ± 0% 404 ± 0% +0.75% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=50-16 401 ± 0% 404 ± 0% +0.75% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1/other-txn-intents=false/percent-flushed=100-16 401 ± 0% 404 ± 0% +0.75% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=0-16 104 ± 0% 8 ± 0% -92.31% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=50-16 104 ± 0% 8 ± 0% -92.31% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=false/percent-flushed=100-16 104 ± 0% 8 ± 0% -92.31% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=0-16 401 ± 0% 305 ± 0% -23.94% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=50-16 401 ± 0% 305 ± 0% -23.94% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=100/other-txn-intents=true/percent-flushed=100-16 401 ± 0% 305 ± 0% -23.94% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=0-16 102 ± 0% 3 ± 0% -97.06% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=50-16 102 ± 0% 3 ± 0% -97.06% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=false/percent-flushed=100-16 102 ± 0% 3 ± 0% -97.06% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=0-16 401 ± 0% 303 ± 0% -24.44% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=50-16 401 ± 0% 303 ± 0% -24.44% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=10/sparseness=1000/other-txn-intents=true/percent-flushed=100-16 401 ± 0% 303 ± 0% -24.44% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=0-16 401 ± 0% 404 ± 0% +0.75% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=50-16 401 ± 0% 404 ± 0% +0.75% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=1/other-txn-intents=false/percent-flushed=100-16 401 ± 0% 404 ± 0% +0.75% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=0-16 104 ± 0% 8 ± 0% -92.31% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=50-16 104 ± 0% 8 ± 0% -92.31% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=false/percent-flushed=100-16 104 ± 0% 8 ± 0% -92.31% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=0-16 401 ± 0% 305 ± 0% -23.94% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=50-16 401 ± 0% 305 ± 0% -23.94% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=100/other-txn-intents=true/percent-flushed=100-16 401 ± 0% 305 ± 0% -23.94% (p=0.008 n=5+5) IntentRangeResolution/separated=true/versions=100/sparseness=1000/other-txn-intents=false/percent-flushed=0-16 102 ± 0% 3 ± 0% -97.06% (p=0.000 n=5+1) ``` Release note (performance improvement): Intent resolution for transactions that write many intents such that we track intent ranges, for the purpose of intent resolution, is much faster (potentially 100x) when using the separated lock table. Release justification: Fix for high-severity poor performance in existing functionality, and is a low risk, high benefit change, that potentially improves ranged intent resolution speed by 100x.

sumeerbhola · 2021-09-01T17:58:28Z

bors r+

craig · 2021-09-01T21:11:21Z

Build succeeded:

GitHub CI (Cockroach)

sumeerbhola requested review from itsbilal, tbg and nvanbenschoten June 9, 2021 15:27

sumeerbhola commented Jun 9, 2021

View reviewed changes

tbg reviewed Jun 10, 2021

View reviewed changes

sumeerbhola force-pushed the opt_rir branch from aecd013 to 4d09f18 Compare June 10, 2021 13:02

sumeerbhola commented Jun 10, 2021

View reviewed changes

tbg self-requested a review June 10, 2021 15:15

tbg reviewed Jun 10, 2021

View reviewed changes

nvanbenschoten reviewed Jun 14, 2021

View reviewed changes

sumeerbhola commented Jun 14, 2021

View reviewed changes

tbg reviewed Jun 21, 2021

View reviewed changes

sumeerbhola mentioned this pull request Jun 22, 2021

kvcoord: Consider increasing max_intents_bytes default #54029

Closed

sumeerbhola force-pushed the opt_rir branch from 4d09f18 to 9772c15 Compare July 12, 2021 21:45

sumeerbhola commented Jul 12, 2021

View reviewed changes

sumeerbhola force-pushed the opt_rir branch from 9772c15 to 8a59964 Compare August 17, 2021 12:50

sumeerbhola requested review from a team as code owners August 17, 2021 12:50

sumeerbhola force-pushed the opt_rir branch 2 times, most recently from f465069 to 7a0f169 Compare August 17, 2021 16:22

sumeerbhola requested a review from a team as a code owner August 17, 2021 16:22

sumeerbhola force-pushed the opt_rir branch 3 times, most recently from 0b3ee28 to 14cef4d Compare August 18, 2021 19:57

sumeerbhola force-pushed the opt_rir branch 3 times, most recently from 4e9a819 to 97159cc Compare August 19, 2021 22:58

otan removed the request for review from a team August 24, 2021 05:51

itsbilal approved these changes Aug 26, 2021

View reviewed changes

nvanbenschoten approved these changes Aug 26, 2021

View reviewed changes

sumeerbhola force-pushed the opt_rir branch from 97159cc to 44fb36c Compare August 27, 2021 21:25

sumeerbhola commented Aug 27, 2021

View reviewed changes

sumeerbhola force-pushed the opt_rir branch 4 times, most recently from 0dfdc68 to f2ee987 Compare August 30, 2021 16:25

sumeerbhola force-pushed the opt_rir branch 2 times, most recently from 04763d1 to 7ff0b9e Compare September 1, 2021 14:29

sumeerbhola force-pushed the opt_rir branch from 7ff0b9e to 2316884 Compare September 1, 2021 14:42

sumeerbhola force-pushed the opt_rir branch from 2316884 to b5213fd Compare September 1, 2021 16:13

craig bot merged commit 01b9e69 into cockroachdb:master Sep 1, 2021

cockroach-teamcity mentioned this pull request Sep 1, 2021

storage,kv: faster intent resolution over a key range cockroachdb/docs#11208

Open

nvanbenschoten mentioned this pull request Sep 13, 2021

kv: eliminate range write latch during ranged intent resolution #70137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage,kv: faster intent resolution over a key range #66268

storage,kv: faster intent resolution over a key range #66268

sumeerbhola commented Jun 9, 2021 •

edited

Loading

cockroach-teamcity commented Jun 9, 2021

sumeerbhola left a comment

tbg left a comment

sumeerbhola left a comment

tbg left a comment

nvanbenschoten left a comment

sumeerbhola left a comment

tbg left a comment

sumeerbhola left a comment

sumeerbhola commented Aug 18, 2021

sumeerbhola commented Aug 26, 2021

itsbilal left a comment

nvanbenschoten left a comment

sumeerbhola left a comment

sumeerbhola commented Aug 31, 2021

sumeerbhola commented Sep 1, 2021

sumeerbhola commented Sep 1, 2021

craig bot commented Sep 1, 2021

	// We've found that there's something to correct; send an RecomputeStatsRequest. Note that this
	// code runs only on the lease holder (at the time of initiating the computation), so this work
	// isn't duplicated except in rare leaseholder change scenarios (and concurrent invocation of
	// RecomputeStats is allowed because these requests block on one another). Also, we're
	// essentially paced by the consistency checker so we won't call this too often.
	log.Infof(ctx, "triggering stats recomputation to resolve delta of %+v", results[0].Response.Delta)

	req := roachpb.RecomputeStatsRequest{
	RequestHeader: roachpb.RequestHeader{Key: startKey},
	}

storage,kv: faster intent resolution over a key range #66268

storage,kv: faster intent resolution over a key range #66268

Conversation

sumeerbhola commented Jun 9, 2021 • edited Loading

cockroach-teamcity commented Jun 9, 2021

sumeerbhola left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

sumeerbhola commented Aug 18, 2021

sumeerbhola commented Aug 26, 2021

itsbilal left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

sumeerbhola commented Aug 31, 2021

sumeerbhola commented Sep 1, 2021

sumeerbhola commented Sep 1, 2021

craig bot commented Sep 1, 2021

sumeerbhola commented Jun 9, 2021 •

edited

Loading