From a1b68ff89f9b6cbc0823cb60914a6d4dd56165e6 Mon Sep 17 00:00:00 2001 From: David Taylor Date: Fri, 18 Oct 2019 12:57:01 +0000 Subject: [PATCH] libroach: bump up rocksdb backpressure limits System-critical writes in Cockroach, like node-liveness, just can not be slow or they will fail, meaning that if theese rocksdb back-pressure slowdowns ever kick in, they usually do not gradually slow traffic until the system reaches some stable throughput equilibrium as intended, but rather cause liveness to fail and result in sudden unavailability -- the opposite of what they were intended to do. Thus we are probably better off just letting the metrics they were intended to protect -- like read-amplification or compaction debt -- stray further into unhealthy territory, than we are back-pressuring and hastening our demise: slower reads due to elevated read-amp are still better than no reads due to node-liveness failures (and indeed slower reads may serve as their own backpressure as we usually need to read to write). Release note: None --- c-deps/libroach/options.cc | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/c-deps/libroach/options.cc b/c-deps/libroach/options.cc index 084cdfea14ca..e4a8a717490b 100644 --- a/c-deps/libroach/options.cc +++ b/c-deps/libroach/options.cc @@ -253,22 +253,26 @@ rocksdb::Options DBMakeOptions(DBOptions db_opts) { // slowdowns to writes. // TODO(dt): if/when we dynamically tune for bulk-ingestion, we // could leave this at 20 and only raise it during ingest jobs. - options.level0_slowdown_writes_trigger = 200; + options.level0_slowdown_writes_trigger = 500; // Maximum number of L0 files. Writes are stopped at this // point. This is set significantly higher than // level0_slowdown_writes_trigger to avoid completely blocking // writes. // TODO(dt): if/when we dynamically tune for bulk-ingestion, we // could leave this at 30 and only raise it during ingest. - options.level0_stop_writes_trigger = 400; + options.level0_stop_writes_trigger = 1000; // Maximum estimated pending compaction bytes before slowing writes. - // Default is 64gb but that can be hit during bulk-ingestion since it - // is based on assumptions about relative level sizes that do not hold - // during bulk-ingestion. - // TODO(dt): if/when we dynamically tune for bulk-ingestion, we - // could leave these as-is and only raise / disable them during ingest. - options.soft_pending_compaction_bytes_limit = 256 * 1073741824ull; - options.hard_pending_compaction_bytes_limit = 512 * 1073741824ull; + // Default is 64gb but that can be hit easily during bulk-ingestion since it + // is based on assumptions about relative level sizes that do not hold when + // adding data directly. Additionally some system-critical writes in + // cockroach (node-liveness), just can not be slow or they will fail and cause + // unavilability, so back-pressuring may *cause* unavailability, instead of + // gracefully slowing to some stable equilibrium to avoid it. As such, we want + // these set very high so are very unlikely to hit them. + // TODO(dt): if/when we dynamically tune for bulk-ingestion, we could leave + // these as-is and only raise / disable them during ingest. + options.soft_pending_compaction_bytes_limit = 2048 * 1073741824ull; + options.hard_pending_compaction_bytes_limit = 4098 * 1073741824ull; // Flush write buffers to L0 as soon as they are full. A higher // value could be beneficial if there are duplicate records in each // of the individual write buffers, but perf testing hasn't shown