*: avoid long non-preemptable function calls #115192

dt · 2023-11-28T19:03:33Z

It appears that due to golang/go#64417, we sometimes see long gc pause times and traces show STW pauses overlapping with block hashing. This has been observed for example during backups to s3, wherein the s3 sdk does both md5 and sha256 hashes of the 8mb chunks it buffered before uploading, but could affect other users of the hashing libraries as well.

We may want to consider patching the hashing functions on our go fork until the issue is resolved upstream.

Jira issue: CRDB-33925

blathers-crl · 2023-11-28T19:03:35Z

cc @cockroachdb/disaster-recovery

sumeerbhola · 2023-12-05T18:57:28Z

wherein the s3 sdk does both md5 and sha256 hashes of the 8mb chunks it buffered before uploading

@dt do you know how long (roughly) the non-preemptable intervals are? < 10ms?

dt · 2023-12-06T01:11:03Z

how long (roughly) the non-preemptable intervals are? < 10ms?

typically around or under 10ms but some traces indicated sometimes as much as 40ms

nvanbenschoten · 2023-12-20T19:49:01Z

@dt @sumeerbhola what is our current plan with this? Who is owning its resolution? We have at least one customer who is actively waiting on improvements here, now that we've identified what those improvements should be.

sumeerbhola · 2023-12-20T21:17:55Z

I don't think this should have the AC label, since this is about work that is not preemptible by the goroutine scheduler.

There is a secondary issue that preemptible work should request admission, via the elastic CPU tokens mechanism -- that is already tracked in #107770

dt · 2023-12-20T21:31:13Z

what is our current plan with this

On the DR side, we've determined that we don't have the option to simply avoid hashing -- turning off md5'ing in s3 uploads breaks when bucket versioning locking is enabled -- and the minimum buffer size we can pass to s3 to hash and upload is 5MB (the default is 8 in 23.1, and will be 5MB in 23.2).

work that is not preemptible by the goroutine scheduler.

I think this issue is tracking making some classes of work that is not preemptable actually be preemptable, i.e. patching our fork of the stdlib so that the two mentioned hashing functions become preemptible (by the scheduler) work. I believe AC is already maintaining our patched fork of go (for the grunning patches?).

preemptible work should request admission, via the elastic CPU tokens mechanism

This is on DR's backlog, to integrate AC's existing libraries for requesting tokens/pacing into the the backup process as it constructs SSTs and writes them into the various cloud SDK uploaders, but that won't help us here: we can write as slowly as we want but the uploaders will buffer what we write into a chunk before hashing it and uploading it, and the minimum chunk size is 5MB, so that's the unit that'll get passed to hashing at once.

blathers-crl · 2024-01-10T18:40:29Z

cc @cockroachdb/disaster-recovery

nicktrav · 2024-02-01T20:23:59Z

Here are some results comparing BenchmarkLatencyWhileHashing between builds on master @ cc4fdff. The results look promising.

https://gist.github.com/nicktrav/e1544fb1dc2d1bc6d04a1dd9db52e670

The RHS updates our runtime patch to include the following:

diff --git a/src/crypto/md5/md5.go b/src/crypto/md5/md5.go
index ccee4ea3a9..e15bc6d6d6 100644
--- a/src/crypto/md5/md5.go
+++ b/src/crypto/md5/md5.go
@@ -27,6 +27,10 @@ const Size = 16
 // The blocksize of MD5 in bytes.
 const BlockSize = 64

+// The maximum number of bytes that can be passed to block.
+const maxAsmIters = 1024
+const maxAsmSize = BlockSize * maxAsmIters // 64KiB
+
 const (
        init0 = 0x67452301
        init1 = 0xEFCDAB89
@@ -130,6 +134,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
        if len(p) >= BlockSize {
                n := len(p) &^ (BlockSize - 1)
                if haveAsm {
+                       for n > maxAsmSize {
+                               block(d, p[:maxAsmSize])
+                               p = p[maxAsmSize:]
+                               n -= maxAsmSize
+                       }
                        block(d, p[:n])
                } else {
                        blockGeneric(d, p[:n])
diff --git a/src/crypto/md5/md5_test.go b/src/crypto/md5/md5_test.go
index 851e7fb10d..e120be3718 100644
--- a/src/crypto/md5/md5_test.go
+++ b/src/crypto/md5/md5_test.go
@@ -120,10 +120,11 @@ func TestGoldenMarshal(t *testing.T) {

 func TestLarge(t *testing.T) {
        const N = 10000
+       const offsets = 4
        ok := "2bb571599a4180e1d542f76904adc3df" // md5sum of "0123456789" * 1000
-       block := make([]byte, 10004)
+       block := make([]byte, N+offsets)
        c := New()
-       for offset := 0; offset < 4; offset++ {
+       for offset := 0; offset < offsets; offset++ {
                for i := 0; i < N; i++ {
                        block[offset+i] = '0' + byte(i%10)
                }
@@ -142,6 +143,31 @@ func TestLarge(t *testing.T) {
        }
 }

+func TestExtraLarge(t *testing.T) {
+       const N = 100000
+       const offsets = 4
+       ok := "13572e9e296cff52b79c52148313c3a5" // md5sum of "0123456789" * 10000
+       block := make([]byte, N+offsets)
+       c := New()
+       for offset := 0; offset < offsets; offset++ {
+               for i := 0; i < N; i++ {
+                       block[offset+i] = '0' + byte(i%10)
...skipping...
-       for offset := 0; offset < 4; offset++ {
+       for offset := 0; offset < offsets; offset++ {
                for i := 0; i < N; i++ {
                        block[offset+i] = '0' + byte(i%10)
                }
@@ -142,6 +143,31 @@ func TestLarge(t *testing.T) {
        }
 }

+func TestExtraLarge(t *testing.T) {
+       const N = 100000
+       const offsets = 4
+       ok := "13572e9e296cff52b79c52148313c3a5" // md5sum of "0123456789" * 10000
+       block := make([]byte, N+offsets)
+       c := New()
+       for offset := 0; offset < offsets; offset++ {
+               for i := 0; i < N; i++ {
+                       block[offset+i] = '0' + byte(i%10)
+               }
+               for blockSize := 10; blockSize <= N; blockSize *= 10 {
+                       blocks := N / blockSize
+                       b := block[offset : offset+blockSize]
+                       c.Reset()
+                       for i := 0; i < blocks; i++ {
+                               c.Write(b)
+                       }
+                       s := fmt.Sprintf("%x", c.Sum(nil))
+                       if s != ok {
+                               t.Fatalf("md5 TestExtraLarge offset=%d, blockSize=%d = %s want %s", offset, blockSize, s, ok)
+                       }
+               }
+       }
+}
+
 // Tests that blockGeneric (pure Go) and block (in assembly for amd64, 386, arm) match.
 func TestBlockGeneric(t *testing.T) {
        gen, asm := New().(*digest), New().(*digest)
diff --git a/src/crypto/sha256/sha256.go b/src/crypto/sha256/sha256.go
index 2deafbc9fc..567a3c81f9 100644
--- a/src/crypto/sha256/sha256.go
+++ b/src/crypto/sha256/sha256.go
@@ -28,6 +28,10 @@ const Size224 = 28
 // The blocksize of SHA256 and SHA224 in bytes.
 const BlockSize = 64

+// The maximum number of bytes that can be passed to block.
+const maxAsmIters = 1024
+const maxAsmSize = BlockSize * maxAsmIters // 64KiB
+
 const (
        chunk     = 64
        init0     = 0x6A09E667
@@ -191,6 +195,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
        }
        if len(p) >= chunk {
                n := len(p) &^ (chunk - 1)
+               for n > maxAsmSize {
+                       block(d, p[:maxAsmSize])
+                       p = p[maxAsmSize:]
+                       n -= maxAsmSize
+               }
                block(d, p[:n])
                p = p[n:]
        }

Assembly code is non-preemptible, and goroutines running such pre-emptible calls can delay stop-the-world GC pauses, impacting tail latency if they are timed poorly. Certain backup codepaths in Cockroach will spend a material amount of time encrypting large blocks of data. These calls involve non-preemptible calls. Mitigate the risk for these crypto libraries by bounding the work that any single call into an assembly routine performs. The impact of this change on the `BenchmarkLatencyWhileHashing` can be found here. In particular, for the SHA256 algorithm, the impact on latency was observed to improve by close to 2x, when encrypting larger block sizes. Inspiration taken from golang/go#/64417. Touches cockroachdb#115192. Release note: None.

blathers-crl · 2024-02-13T20:35:50Z

cc @cockroachdb/disaster-recovery

118605: build: add go runtime patch to bound non-preemptible work r=nvanbenschoten,rickystewart,rsevinsky-cr a=nicktrav There are two commits in this patch set, and are best reviewed individually. --- The first is a purely mechanical change to reapply our existing Go runtime patches to the upstream, and then port the diff back into Cockroach. This shuffles around the ordering of the various patches, making subsequent patching simpler. The second patch is cribbed from golang/go#64417, and is the material change, touching #115192. Epic: None. Co-authored-by: Nick Travers <travers@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

rharding6373 · 2024-02-20T20:43:10Z

Nick is OOO, so assigning to @sumeerbhola so it doesn't get lost.

Assembly code is non-preemptible, and goroutines running such pre-emptible calls can delay stop-the-world GC pauses, impacting tail latency if they are timed poorly. Certain backup codepaths in Cockroach will spend a material amount of time encrypting large blocks of data. These calls involve non-preemptible calls. Mitigate the risk for these crypto libraries by bounding the work that any single call into an assembly routine performs. The impact of this change on the `BenchmarkLatencyWhileHashing` can be found here. In particular, for the SHA256 algorithm, the impact on latency was observed to improve by close to 2x, when encrypting larger block sizes. Inspiration taken from golang/go#/64417. Touches cockroachdb#115192. Release note: None.

dt added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-disaster-recovery T-disaster-recovery T-admission-control Admission Control labels Nov 28, 2023

This was referenced Nov 28, 2023

backup: elevated tail latencies in SQL workload while backing up to s3 #115190

Open

cloud/*: use smaller 5mb chunk size #115194

Closed

dt added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Nov 28, 2023

benbardin added the P-2 Issues/test failures with a fix SLA of 3 months label Nov 29, 2023

dt added P-1 Issues/test failures with a fix SLA of 1 month and removed P-2 Issues/test failures with a fix SLA of 3 months labels Nov 29, 2023

dt removed A-disaster-recovery T-admission-control Admission Control labels Dec 4, 2023

exalate-issue-sync bot added T-admission-control Admission Control and removed T-disaster-recovery labels Dec 4, 2023

sumeerbhola removed the T-admission-control Admission Control label Dec 20, 2023

This comment was marked as off-topic.

Sign in to view

lunevalex added the T-disaster-recovery label Jan 10, 2024

blathers-crl bot added the A-disaster-recovery label Jan 10, 2024

dt assigned nicktrav Jan 16, 2024

nicktrav mentioned this issue Feb 1, 2024

build: add go runtime patch to bound non-preemptible work #118605

Merged

blathers-crl bot added the T-server-and-security DB Server & Security label Feb 12, 2024

exalate-issue-sync bot removed the T-disaster-recovery label Feb 12, 2024

dt removed the A-disaster-recovery label Feb 13, 2024

blathers-crl bot added the T-disaster-recovery label Feb 13, 2024

rharding6373 assigned sumeerbhola Feb 20, 2024

exalate-issue-sync bot removed the T-disaster-recovery label Feb 20, 2024

exalate-issue-sync bot unassigned sumeerbhola Feb 20, 2024

nicktrav closed this as completed Jul 11, 2024

github-project-automation bot added this to Disaster Recovery Backlog Aug 28, 2024

github-project-automation bot moved this to Done in Disaster Recovery Backlog Aug 28, 2024

github-project-automation bot added this to DB Server & Security Aug 28, 2024

github-project-automation bot moved this to Done 21.2 in DB Server & Security Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: avoid long non-preemptable function calls #115192

*: avoid long non-preemptable function calls #115192

dt commented Nov 28, 2023 •

edited by rickystewart

Loading

blathers-crl bot commented Nov 28, 2023

sumeerbhola commented Dec 5, 2023

dt commented Dec 6, 2023

nvanbenschoten commented Dec 20, 2023

sumeerbhola commented Dec 20, 2023

dt commented Dec 20, 2023 •

edited

Loading

This comment was marked as off-topic.

blathers-crl bot commented Jan 10, 2024

nicktrav commented Feb 1, 2024

blathers-crl bot commented Feb 13, 2024

rharding6373 commented Feb 20, 2024

*: avoid long non-preemptable function calls #115192

*: avoid long non-preemptable function calls #115192

Comments

dt commented Nov 28, 2023 • edited by rickystewart Loading

blathers-crl bot commented Nov 28, 2023

sumeerbhola commented Dec 5, 2023

dt commented Dec 6, 2023

nvanbenschoten commented Dec 20, 2023

sumeerbhola commented Dec 20, 2023

dt commented Dec 20, 2023 • edited Loading

This comment was marked as off-topic.

blathers-crl bot commented Jan 10, 2024

nicktrav commented Feb 1, 2024

blathers-crl bot commented Feb 13, 2024

rharding6373 commented Feb 20, 2024

dt commented Nov 28, 2023 •

edited by rickystewart

Loading

dt commented Dec 20, 2023 •

edited

Loading