kv/concurrency: misleading duration of contention events with txn pushes #98104

yuzefovich · 2023-03-06T22:03:23Z

Here (internal slack) is a stmt bundle in which we have a case when the total execution time of subquery 1 was 1.9s yet we reported 28s of contention. This is due to the read in that subquery being repeatedly pushed, so we produced multiple contention events with constantly increasing duration, here is a subset (edited to remove possible PII):

"structuredRecords": [
    {
        "time": "2023-03-05T08:02:20.763129115Z",
        "payload": {
            "@type": "type.googleapis.com/cockroach.roachpb.ContentionEvent",
            "duration": "0.056493704s"
        }
    },
    {
        "time": "2023-03-05T08:02:20.823929395Z",
        "payload": {
            "@type": "type.googleapis.com/cockroach.roachpb.ContentionEvent",
            "duration": "0.117294565s"
        }
    },
    {
        "time": "2023-03-05T08:02:20.921669801Z",
        "payload": {
            "@type": "type.googleapis.com/cockroach.roachpb.ContentionEvent",
            "duration": "0.215034456s"
        }
    },
...

Notably, all these events occurred when performing the same read, but the duration keeps on increasing (due to the read being pushed many times AFAIU), until the last event records the actual contention of 1.88s.

SQL treats all of these events as separate, so it aggregates them here

cockroach/pkg/sql/execstats/stats.go

Line 38 in 2853470

func GetCumulativeContentionTime(

I believe it's a bug in the KV layer where we should be producing the contention events such that their duration doesn't include the duration of contention of previously reported events.

Jira issue: CRDB-25078

The text was updated successfully, but these errors were encountered:

blathers-crl · 2023-03-20T20:29:25Z

Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

nvanbenschoten · 2023-03-21T20:54:19Z

I think this was broken by 70ef641.

Before that change, we would reset the clock on each contention event 70ef641#diff-b2445fe50a89171ab341cf76c645ef668d8dcccd14143b7c96d07d8b52047941L902. These events were emitted on every new (key, txn) pair.

Now, we reset it whenever we wait on a new key:

cockroach/pkg/kv/kvserver/concurrency/lock_table.go

Line 1762 in 77029b1

g.mu.curLockWaitStart = clock.PhysicalTime()

This means that we may emit multiple contention events on the same key but for different lock holder txns without resetting the timer.

99166: kv/concurrency: compute contention event duration from (key,txn) wait start time r=andreimatei a=nvanbenschoten Fixes #98104. This commit resolves the bug identified in #98104 where multiple contention events could be emitted with overlapping durations, such that when these durations were added together (e.g. by `GetCumulativeContentionTime`), their sum was larger than the runtime of the entire query. This was possible because as of 70ef641, we were failing to reset the waiting start time on each new lock holder transaction for the same key. This commit fixes this by computing the contention event duration using `contentionTag.waitStart` instead of `waitingState.lockWaitStart`. It also cleans up some of this code and makes it harder to make such a mistake in the future. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

… start time Fixes #98104. This commit resolves the bug identified in #98104 where multiple contention events could be emitted with overlapping durations, such that when these durations were added together (e.g. by `GetCumulativeContentionTime`), their sum was larger than the runtime of the entire query. This was possible because as of 70ef641, we were failing to reset the waiting start time on each new lock holder transaction for the same key. This commit fixes this by computing the contention event duration using `contentionTag.waitStart` instead of `waitingState.lockWaitStart`. It also cleans up some of this code and makes it harder to make such a mistake in the future. Release note: None

… start time Fixes cockroachdb#98104. This commit resolves the bug identified in cockroachdb#98104 where multiple contention events could be emitted with overlapping durations, such that when these durations were added together (e.g. by `GetCumulativeContentionTime`), their sum was larger than the runtime of the entire query. This was possible because as of 70ef641, we were failing to reset the waiting start time on each new lock holder transaction for the same key. This commit fixes this by computing the contention event duration using `contentionTag.waitStart` instead of `waitingState.lockWaitStart`. It also cleans up some of this code and makes it harder to make such a mistake in the future. Release note: None

yuzefovich added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 6, 2023

blathers-crl bot added the T-kv KV Team label Mar 6, 2023

nvanbenschoten added the GA-blocker label Mar 20, 2023

nvanbenschoten added the branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 label Mar 20, 2023

nvanbenschoten self-assigned this Mar 21, 2023

nvanbenschoten mentioned this issue Mar 21, 2023

kv/concurrency: compute contention event duration from (key,txn) wait start time #99166

Merged

craig bot closed this as completed in 1201c01 Mar 29, 2023

blathers-crl bot mentioned this issue Mar 29, 2023

release-23.1: kv/concurrency: compute contention event duration from (key,txn) wait start time #99928

Merged

This was referenced Mar 29, 2023

release-22.2: kv/concurrency: compute contention event duration from (key,txn) wait start time #99938

Merged

release-22.1: kv/concurrency: compute contention event duration from (key,txn) wait start time #99940

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/concurrency: misleading duration of contention events with txn pushes #98104

kv/concurrency: misleading duration of contention events with txn pushes #98104

yuzefovich commented Mar 6, 2023 •

edited

Loading

blathers-crl bot commented Mar 20, 2023

nvanbenschoten commented Mar 21, 2023

kv/concurrency: misleading duration of contention events with txn pushes #98104

kv/concurrency: misleading duration of contention events with txn pushes #98104

Comments

yuzefovich commented Mar 6, 2023 • edited Loading

blathers-crl bot commented Mar 20, 2023

nvanbenschoten commented Mar 21, 2023

yuzefovich commented Mar 6, 2023 •

edited

Loading