kv,sql: expensive select queries cause OOM #123000

andrewbaptist · 2024-04-24T19:24:51Z

Describe the problem

Running multiple queries in parallel can cause nodes to OOM.

To Reproduce

Create a 3 node cluster initialized with a KV DB.
Begin a background workload to write many KV keys
roachprod ssh $CLUSTER:3 "./cockroach workload run kv $(roachprod pgurl $CLUSTER:3) --timeout 5s --tolerate-errors"
Wait a little while to allow there to be enough keys written to the system.
Start a number of select queries in the background:
for n in {1..3}; do for x in {1..100}; do echo "select * from kv.kv where v='abc';" | roachprod sql $CLUSTER:$n & done; done
Notice that all the cockroach nodes crash

[17561.221476] cockroach invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
[17629.744740] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-431.scope,task=cockroach,pid=32634,uid=1000
[17629.744773] Out of memory: Killed process 32634 (cockroach) total-vm:1735716kB, anon-rss:206688kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:964kB oom_score_adj:0

Expected behavior
The nodes should not crash. On a muti-tenant system a single tenant system, one tenant could bring down the entire cluster.

Additional data / screenshots
Depending on the number of concurrent select(*), we get different behaviors:
10 queries/node - p50 latency on writes jumps from 2ms -> 900ms, QPS goes from 4000 -> 10
20 queries/node - p50 latency goes to 2s, QPS goes to 1-2
40 queries/node - p50 latency goes to 10s, QPS goes to 0 (timeouts). Causes liveness failures.
100 queries/node - all nodes crash with OOM

CPU Profile: profile.pb.gz
Heap profile (at 30 concurrency): profile.pb.gz
Note that the heap profile doesn't account for all the memory. Here is a line from the cockroach-health log at about the same time as the heap profile:
I240424 19:20:08.898205 324 2@server/status/runtime_log.go:47 ⋮ [T1,Vsystem,n2] 734 runtime stats: 6.2 GiB RSS, 982 goroutines (stacks: 17 MiB), 2.7 GiB/4.1 GiB Go alloc/total (heap fragmentation: 27 MiB, heap reserved: 1.3 GiB, heap released: 1.5 GiB), 2.1 GiB/2.4 GiB CGO alloc/total (9.0 CGO/sec), 394.5/3.1 %(u/s)time, 0.0 %gc (188x), 569 KiB/622 KiB (r/w)net

Environment:
This likely cccurs on all releases. This was tested on 24.1/master.

Additional context
We have seen customer cases where heavy queries can cause either liveness failures or OOMs

Jira issue: CRDB-38160

The text was updated successfully, but these errors were encountered:

sumeerbhola · 2024-04-25T13:50:03Z

Thanks for the reproduction!
I have tagged this with KV and SQL for initial investigation, and removed admission from the title, since AC does no memory tracking, or admission based on memory.

Glancing at the heap profile, which only accounts for 658MB, all the memory usage is in MVCCScanToBytes in the pebbleResults which should be accounted for in the KV memory accounting. It would be good to have a profile that shows the high memory usage.

yuzefovich · 2024-05-07T19:22:41Z

This query performs a full scan of the kv table, and each KV BatchRequest is issued with TargetBytes limit of 10MiB. When the first BatchRequest is sent, we only pre-reserve 1KiB in the memory accounting system, but when the response comes back, we reconcile the accounting with the actual memory footprint, and for 2nd and all consecutive batches we do keep the reasonable reservation. If we have many queries that start around the same time on the same gateway, then we can issue huge number of such BatchRequests while only pre-reserving a fraction of needed memory, thus overloading the KV server. Ideally, the KV server would protect itself.

In the kvstreamer client we address this limitation by estimating the expected response size and reserving that up front while reconciling the accounting after having received the response. We could do something similar in the txnKVFetcher, but I'd rather address #82164 which tracks using the Streamer to power scans (in addition to lookup and index joins). We do have a prototype for that #103659 - it might be an interesting experiment to see how the system behaves on a binary with that prototype.

andrewbaptist added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs labels Apr 24, 2024

sumeerbhola added the T-sql-queries SQL Queries Team label Apr 25, 2024

cockroach-teamcity added this to SQL Queries Apr 25, 2024

github-project-automation bot moved this to Triage in SQL Queries Apr 25, 2024

sumeerbhola changed the title ~~admission: failure to protect nodes against expensive select queries~~ sql: expensive select queries cause OOM Apr 25, 2024

sumeerbhola changed the title ~~sql: expensive select queries cause OOM~~ kv,sql: expensive select queries cause OOM Apr 25, 2024

sumeerbhola added the T-kv KV Team label Apr 25, 2024

yuzefovich moved this from Triage to 24.2 Release in SQL Queries May 7, 2024

yuzefovich moved this from 24.2 Release to Backlog in SQL Queries May 7, 2024

andrewbaptist added the P-3 Issues/test failures with no fix SLA label May 13, 2024

arulajmani removed the T-kv KV Team label May 13, 2024

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Incoming in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv,sql: expensive select queries cause OOM #123000

kv,sql: expensive select queries cause OOM #123000

andrewbaptist commented Apr 24, 2024 •

edited by cockroach-jira-scripts

Loading

sumeerbhola commented Apr 25, 2024

yuzefovich commented May 7, 2024

kv,sql: expensive select queries cause OOM #123000

kv,sql: expensive select queries cause OOM #123000

Comments

andrewbaptist commented Apr 24, 2024 • edited by cockroach-jira-scripts Loading

sumeerbhola commented Apr 25, 2024

yuzefovich commented May 7, 2024

andrewbaptist commented Apr 24, 2024 •

edited by cockroach-jira-scripts

Loading