Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: avoid long non-preemptable function calls #115192

Closed
Tracked by #115190
dt opened this issue Nov 28, 2023 · 11 comments
Closed
Tracked by #115190

*: avoid long non-preemptable function calls #115192

dt opened this issue Nov 28, 2023 · 11 comments
Assignees
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month T-server-and-security DB Server & Security

Comments

@dt
Copy link
Member

dt commented Nov 28, 2023

It appears that due to golang/go#64417, we sometimes see long gc pause times and traces show STW pauses overlapping with block hashing. This has been observed for example during backups to s3, wherein the s3 sdk does both md5 and sha256 hashes of the 8mb chunks it buffered before uploading, but could affect other users of the hashing libraries as well.

We may want to consider patching the hashing functions on our go fork until the issue is resolved upstream.

Jira issue: CRDB-33925

@dt dt added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-disaster-recovery T-disaster-recovery T-admission-control Admission Control labels Nov 28, 2023
Copy link

blathers-crl bot commented Nov 28, 2023

cc @cockroachdb/disaster-recovery

@dt dt added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Nov 28, 2023
@benbardin benbardin added the P-2 Issues/test failures with a fix SLA of 3 months label Nov 29, 2023
@dt dt added P-1 Issues/test failures with a fix SLA of 1 month and removed P-2 Issues/test failures with a fix SLA of 3 months labels Nov 29, 2023
@exalate-issue-sync exalate-issue-sync bot added T-admission-control Admission Control and removed T-disaster-recovery labels Dec 4, 2023
@sumeerbhola
Copy link
Collaborator

wherein the s3 sdk does both md5 and sha256 hashes of the 8mb chunks it buffered before uploading

@dt do you know how long (roughly) the non-preemptable intervals are? < 10ms?

@dt
Copy link
Member Author

dt commented Dec 6, 2023

how long (roughly) the non-preemptable intervals are? < 10ms?

typically around or under 10ms but some traces indicated sometimes as much as 40ms

@nvanbenschoten
Copy link
Member

@dt @sumeerbhola what is our current plan with this? Who is owning its resolution? We have at least one customer who is actively waiting on improvements here, now that we've identified what those improvements should be.

@sumeerbhola
Copy link
Collaborator

I don't think this should have the AC label, since this is about work that is not preemptible by the goroutine scheduler.

There is a secondary issue that preemptible work should request admission, via the elastic CPU tokens mechanism -- that is already tracked in #107770

@sumeerbhola sumeerbhola removed the T-admission-control Admission Control label Dec 20, 2023
@dt
Copy link
Member Author

dt commented Dec 20, 2023

what is our current plan with this

On the DR side, we've determined that we don't have the option to simply avoid hashing -- turning off md5'ing in s3 uploads breaks when bucket versioning locking is enabled -- and the minimum buffer size we can pass to s3 to hash and upload is 5MB (the default is 8 in 23.1, and will be 5MB in 23.2).

work that is not preemptible by the goroutine scheduler.

I think this issue is tracking making some classes of work that is not preemptable actually be preemptable, i.e. patching our fork of the stdlib so that the two mentioned hashing functions become preemptible (by the scheduler) work. I believe AC is already maintaining our patched fork of go (for the grunning patches?).

preemptible work should request admission, via the elastic CPU tokens mechanism

This is on DR's backlog, to integrate AC's existing libraries for requesting tokens/pacing into the the backup process as it constructs SSTs and writes them into the various cloud SDK uploaders, but that won't help us here: we can write as slowly as we want but the uploaders will buffer what we write into a chunk before hashing it and uploading it, and the minimum chunk size is 5MB, so that's the unit that'll get passed to hashing at once.

@nicktrav

This comment was marked as off-topic.

Copy link

blathers-crl bot commented Jan 10, 2024

cc @cockroachdb/disaster-recovery

@nicktrav
Copy link
Collaborator

nicktrav commented Feb 1, 2024

Here are some results comparing BenchmarkLatencyWhileHashing between builds on master @ cc4fdff. The results look promising.

https://gist.github.com/nicktrav/e1544fb1dc2d1bc6d04a1dd9db52e670

The RHS updates our runtime patch to include the following:

diff --git a/src/crypto/md5/md5.go b/src/crypto/md5/md5.go
index ccee4ea3a9..e15bc6d6d6 100644
--- a/src/crypto/md5/md5.go
+++ b/src/crypto/md5/md5.go
@@ -27,6 +27,10 @@ const Size = 16
 // The blocksize of MD5 in bytes.
 const BlockSize = 64

+// The maximum number of bytes that can be passed to block.
+const maxAsmIters = 1024
+const maxAsmSize = BlockSize * maxAsmIters // 64KiB
+
 const (
        init0 = 0x67452301
        init1 = 0xEFCDAB89
@@ -130,6 +134,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
        if len(p) >= BlockSize {
                n := len(p) &^ (BlockSize - 1)
                if haveAsm {
+                       for n > maxAsmSize {
+                               block(d, p[:maxAsmSize])
+                               p = p[maxAsmSize:]
+                               n -= maxAsmSize
+                       }
                        block(d, p[:n])
                } else {
                        blockGeneric(d, p[:n])
diff --git a/src/crypto/md5/md5_test.go b/src/crypto/md5/md5_test.go
index 851e7fb10d..e120be3718 100644
--- a/src/crypto/md5/md5_test.go
+++ b/src/crypto/md5/md5_test.go
@@ -120,10 +120,11 @@ func TestGoldenMarshal(t *testing.T) {

 func TestLarge(t *testing.T) {
        const N = 10000
+       const offsets = 4
        ok := "2bb571599a4180e1d542f76904adc3df" // md5sum of "0123456789" * 1000
-       block := make([]byte, 10004)
+       block := make([]byte, N+offsets)
        c := New()
-       for offset := 0; offset < 4; offset++ {
+       for offset := 0; offset < offsets; offset++ {
                for i := 0; i < N; i++ {
                        block[offset+i] = '0' + byte(i%10)
                }
@@ -142,6 +143,31 @@ func TestLarge(t *testing.T) {
        }
 }

+func TestExtraLarge(t *testing.T) {
+       const N = 100000
+       const offsets = 4
+       ok := "13572e9e296cff52b79c52148313c3a5" // md5sum of "0123456789" * 10000
+       block := make([]byte, N+offsets)
+       c := New()
+       for offset := 0; offset < offsets; offset++ {
+               for i := 0; i < N; i++ {
+                       block[offset+i] = '0' + byte(i%10)
...skipping...
-       for offset := 0; offset < 4; offset++ {
+       for offset := 0; offset < offsets; offset++ {
                for i := 0; i < N; i++ {
                        block[offset+i] = '0' + byte(i%10)
                }
@@ -142,6 +143,31 @@ func TestLarge(t *testing.T) {
        }
 }

+func TestExtraLarge(t *testing.T) {
+       const N = 100000
+       const offsets = 4
+       ok := "13572e9e296cff52b79c52148313c3a5" // md5sum of "0123456789" * 10000
+       block := make([]byte, N+offsets)
+       c := New()
+       for offset := 0; offset < offsets; offset++ {
+               for i := 0; i < N; i++ {
+                       block[offset+i] = '0' + byte(i%10)
+               }
+               for blockSize := 10; blockSize <= N; blockSize *= 10 {
+                       blocks := N / blockSize
+                       b := block[offset : offset+blockSize]
+                       c.Reset()
+                       for i := 0; i < blocks; i++ {
+                               c.Write(b)
+                       }
+                       s := fmt.Sprintf("%x", c.Sum(nil))
+                       if s != ok {
+                               t.Fatalf("md5 TestExtraLarge offset=%d, blockSize=%d = %s want %s", offset, blockSize, s, ok)
+                       }
+               }
+       }
+}
+
 // Tests that blockGeneric (pure Go) and block (in assembly for amd64, 386, arm) match.
 func TestBlockGeneric(t *testing.T) {
        gen, asm := New().(*digest), New().(*digest)
diff --git a/src/crypto/sha256/sha256.go b/src/crypto/sha256/sha256.go
index 2deafbc9fc..567a3c81f9 100644
--- a/src/crypto/sha256/sha256.go
+++ b/src/crypto/sha256/sha256.go
@@ -28,6 +28,10 @@ const Size224 = 28
 // The blocksize of SHA256 and SHA224 in bytes.
 const BlockSize = 64

+// The maximum number of bytes that can be passed to block.
+const maxAsmIters = 1024
+const maxAsmSize = BlockSize * maxAsmIters // 64KiB
+
 const (
        chunk     = 64
        init0     = 0x6A09E667
@@ -191,6 +195,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
        }
        if len(p) >= chunk {
                n := len(p) &^ (chunk - 1)
+               for n > maxAsmSize {
+                       block(d, p[:maxAsmSize])
+                       p = p[maxAsmSize:]
+                       n -= maxAsmSize
+               }
                block(d, p[:n])
                p = p[n:]
        }

nicktrav added a commit to nicktrav/cockroach that referenced this issue Feb 2, 2024
Assembly code is non-preemptible, and goroutines running such
pre-emptible calls can delay stop-the-world GC pauses, impacting tail
latency if they are timed poorly.

Certain backup codepaths in Cockroach will spend a material amount of
time encrypting large blocks of data. These calls involve
non-preemptible calls.

Mitigate the risk for these crypto libraries by bounding the work that
any single call into an assembly routine performs.

The impact of this change on the `BenchmarkLatencyWhileHashing` can be
found here. In particular, for the SHA256 algorithm, the impact on
latency was observed to improve by close to 2x, when encrypting larger
block sizes.

Inspiration taken from golang/go#/64417.

Touches cockroachdb#115192.

Release note: None.
@blathers-crl blathers-crl bot added the T-server-and-security DB Server & Security label Feb 12, 2024
Copy link

blathers-crl bot commented Feb 13, 2024

cc @cockroachdb/disaster-recovery

craig bot pushed a commit that referenced this issue Feb 15, 2024
118605: build: add go runtime patch to bound non-preemptible work r=nvanbenschoten,rickystewart,rsevinsky-cr a=nicktrav

There are two commits in this patch set, and are best reviewed individually.

---

The first is a purely mechanical change to reapply our existing Go runtime patches to the upstream, and then port the diff back into Cockroach. This shuffles around the ordering of the various patches, making subsequent patching simpler.

The second patch is cribbed from golang/go#64417, and is the material change, touching #115192.

Epic: None.

Co-authored-by: Nick Travers <travers@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@rharding6373
Copy link
Collaborator

Nick is OOO, so assigning to @sumeerbhola so it doesn't get lost.

wenyihu6 pushed a commit to wenyihu6/cockroach that referenced this issue Feb 21, 2024
Assembly code is non-preemptible, and goroutines running such
pre-emptible calls can delay stop-the-world GC pauses, impacting tail
latency if they are timed poorly.

Certain backup codepaths in Cockroach will spend a material amount of
time encrypting large blocks of data. These calls involve
non-preemptible calls.

Mitigate the risk for these crypto libraries by bounding the work that
any single call into an assembly routine performs.

The impact of this change on the `BenchmarkLatencyWhileHashing` can be
found here. In particular, for the SHA256 algorithm, the impact on
latency was observed to improve by close to 2x, when encrypting larger
block sizes.

Inspiration taken from golang/go#/64417.

Touches cockroachdb#115192.

Release note: None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month T-server-and-security DB Server & Security
Projects
No open projects
Development

No branches or pull requests

7 participants