libroach: bump up rocksdb backpressure limits #41719

dt · 2019-10-18T13:16:10Z

System-critical writes in Cockroach, like node-liveness, just can not be
slow or they will fail, meaning that if theese rocksdb back-pressure
slowdowns ever kick in, they usually do not gradually slow traffic
until the system reaches some stable throughput equilibrium as intended,
but rather cause liveness to fail and result in sudden unavailability
-- the opposite of what they were intended to do.

Thus we are probably better off just letting the metrics they were
intended to protect -- like read-amplification or compaction debt --
stray further into unhealthy territory, than we are back-pressuring and
hastening our demise: slower reads due to elevated read-amp are still
better than no reads due to node-liveness failures (and indeed slower
reads may serve as their own backpressure as we usually need to read to
write).

Release note: None

cockroach-teamcity · 2019-10-18T13:16:17Z

This change is

ajkr

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajkr and @petermattis)

petermattis

Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @dt and @petermattis)

c-deps/libroach/options.cc, line 256 at r1 (raw file):

  // TODO(dt): if/when we dynamically tune for bulk-ingestion, we
  // could leave this at 20 and only raise it during ingest jobs.
  options.level0_slowdown_writes_trigger = 500;

I wonder if we should set this to 999 or 1000. @ajkr?

c-deps/libroach/options.cc, line 275 at r1 (raw file):

  // these as-is and only raise / disable them during ingest.
  options.soft_pending_compaction_bytes_limit = 2048 * 1073741824ull;
  options.hard_pending_compaction_bytes_limit = 4098 * 1073741824ull;

Nit: can we write this constants as 2048ull << 30 and 4097ull << 30? My eyes are much more used to translating << 30 to GB, then remembering the exact numeric value.

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajkr and @petermattis)

c-deps/libroach/options.cc, line 256 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I wonder if we should set this to 999 or 1000. @ajkr?

makes sense -- if we climb meaningfully above the slowdown number, we're likely to remain slowed a while, and being in a slowdown of any nontrivial duration means we're as good as stopped anyway. bumped it to 950.

c-deps/libroach/options.cc, line 275 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: can we write this constants as 2048ull << 30 and 4097ull << 30? My eyes are much more used to translating << 30 to GB, then remembering the exact numeric value.

👍

Done.

dt · 2019-10-21T12:03:07Z

are we okay with backporting this to release-19.2?

petermattis

are we okay with backporting this to release-19.2?

I am. @ajkr any concerns?

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @dt)

c-deps/libroach/options.cc, line 269 at r2 (raw file):

  // adding data directly. Additionally some system-critical writes in
  // cockroach (node-liveness), just can not be slow or they will fail and cause
  // unavilability, so back-pressuring may *cause* unavailability, instead of

s/unavilability/unavailability/g

System-critical writes in Cockroach, like node-liveness, just can not be slow or they will fail, meaning that if theese rocksdb back-pressure slowdowns ever kick in, they usually do not gradually slow traffic until the system reaches some stable throughput equilibrium as intended, but rather cause liveness to fail and result in sudden unavailability -- the opposite of what they were intended to do. Thus we are probably better off just letting the metrics they were intended to protect -- like read-amplification or compaction debt -- stray further into unhealthy territory, than we are back-pressuring and hastening our demise: slower reads due to elevated read-amp are still better than no reads due to node-liveness failures (and indeed slower reads may serve as their own backpressure as we usually need to read to write). Release note: None

ajkr

I am. @ajkr any concerns?

No concerns. Very curious whether this high L0 file count is pointing us to a problem with the intra-L0 picking heuristic, though.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @dt)

c-deps/libroach/options.cc, line 256 at r1 (raw file):

Previously, dt (David Taylor) wrote…

makes sense -- if we climb meaningfully above the slowdown number, we're likely to remain slowed a while, and being in a slowdown of any nontrivial duration means we're as good as stopped anyway. bumped it to 950.

Makes sense.

dt · 2019-10-21T16:14:20Z

TFTRs!

bors r+

dt · 2019-10-21T18:11:58Z

I think bors got lost again?

bors r+

dt · 2019-10-21T23:01:45Z

it is just not your day, is it bors?

bors r+

41719: libroach: bump up rocksdb backpressure limits r=dt a=dt System-critical writes in Cockroach, like node-liveness, just can not be slow or they will fail, meaning that if theese rocksdb back-pressure slowdowns ever kick in, they usually do not gradually slow traffic until the system reaches some stable throughput equilibrium as intended, but rather cause liveness to fail and result in sudden unavailability -- the opposite of what they were intended to do. Thus we are probably better off just letting the metrics they were intended to protect -- like read-amplification or compaction debt -- stray further into unhealthy territory, than we are back-pressuring and hastening our demise: slower reads due to elevated read-amp are still better than no reads due to node-liveness failures (and indeed slower reads may serve as their own backpressure as we usually need to read to write). Release note: None Co-authored-by: David Taylor <tinystatemachine@gmail.com>

craig · 2019-10-21T23:23:14Z

Build succeeded

GitHub CI (Cockroach)

dt requested review from ajkr and petermattis October 18, 2019 13:16

ajkr approved these changes Oct 18, 2019

View reviewed changes

petermattis reviewed Oct 19, 2019

View reviewed changes

dt force-pushed the rocks-backpressure branch from a1b68ff to e8e8a3f Compare October 19, 2019 12:40

dt commented Oct 19, 2019

View reviewed changes

petermattis approved these changes Oct 21, 2019

View reviewed changes

dt force-pushed the rocks-backpressure branch from e8e8a3f to 3b8aa8a Compare October 21, 2019 13:47

ajkr reviewed Oct 21, 2019

View reviewed changes

dt mentioned this pull request Oct 21, 2019

release-19.2: libroach: bump up rocksdb backpressure limits #41767

Merged

craig bot merged commit 3b8aa8a into cockroachdb:master Oct 21, 2019

dt deleted the rocks-backpressure branch October 22, 2019 01:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libroach: bump up rocksdb backpressure limits #41719

libroach: bump up rocksdb backpressure limits #41719

dt commented Oct 18, 2019

cockroach-teamcity commented Oct 18, 2019

ajkr left a comment

petermattis left a comment

dt left a comment

dt commented Oct 21, 2019 •

edited

Loading

petermattis left a comment

ajkr left a comment

dt commented Oct 21, 2019

dt commented Oct 21, 2019

dt commented Oct 21, 2019

craig bot commented Oct 21, 2019

libroach: bump up rocksdb backpressure limits #41719

libroach: bump up rocksdb backpressure limits #41719

Conversation

dt commented Oct 18, 2019

cockroach-teamcity commented Oct 18, 2019

ajkr left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

dt commented Oct 21, 2019 • edited Loading

petermattis left a comment

Choose a reason for hiding this comment

ajkr left a comment

Choose a reason for hiding this comment

dt commented Oct 21, 2019

dt commented Oct 21, 2019

dt commented Oct 21, 2019

craig bot commented Oct 21, 2019

Build succeeded

dt commented Oct 21, 2019 •

edited

Loading