-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: avoid long non-preemptable function calls #115192
Comments
cc @cockroachdb/disaster-recovery |
@dt do you know how long (roughly) the non-preemptable intervals are? < 10ms? |
typically around or under 10ms but some traces indicated sometimes as much as 40ms |
@dt @sumeerbhola what is our current plan with this? Who is owning its resolution? We have at least one customer who is actively waiting on improvements here, now that we've identified what those improvements should be. |
I don't think this should have the AC label, since this is about work that is not preemptible by the goroutine scheduler. There is a secondary issue that preemptible work should request admission, via the elastic CPU tokens mechanism -- that is already tracked in #107770 |
On the DR side, we've determined that we don't have the option to simply avoid hashing -- turning off md5'ing in s3 uploads breaks when bucket versioning locking is enabled -- and the minimum buffer size we can pass to s3 to hash and upload is 5MB (the default is 8 in 23.1, and will be 5MB in 23.2).
I think this issue is tracking making some classes of work that is not preemptable actually be preemptable, i.e. patching our fork of the stdlib so that the two mentioned hashing functions become preemptible (by the scheduler) work. I believe AC is already maintaining our patched fork of go (for the grunning patches?).
This is on DR's backlog, to integrate AC's existing libraries for requesting tokens/pacing into the the backup process as it constructs SSTs and writes them into the various cloud SDK uploaders, but that won't help us here: we can write as slowly as we want but the uploaders will buffer what we write into a chunk before hashing it and uploading it, and the minimum chunk size is 5MB, so that's the unit that'll get passed to hashing at once. |
This comment was marked as off-topic.
This comment was marked as off-topic.
cc @cockroachdb/disaster-recovery |
Here are some results comparing https://gist.github.com/nicktrav/e1544fb1dc2d1bc6d04a1dd9db52e670 The RHS updates our runtime patch to include the following: diff --git a/src/crypto/md5/md5.go b/src/crypto/md5/md5.go
index ccee4ea3a9..e15bc6d6d6 100644
--- a/src/crypto/md5/md5.go
+++ b/src/crypto/md5/md5.go
@@ -27,6 +27,10 @@ const Size = 16
// The blocksize of MD5 in bytes.
const BlockSize = 64
+// The maximum number of bytes that can be passed to block.
+const maxAsmIters = 1024
+const maxAsmSize = BlockSize * maxAsmIters // 64KiB
+
const (
init0 = 0x67452301
init1 = 0xEFCDAB89
@@ -130,6 +134,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
if len(p) >= BlockSize {
n := len(p) &^ (BlockSize - 1)
if haveAsm {
+ for n > maxAsmSize {
+ block(d, p[:maxAsmSize])
+ p = p[maxAsmSize:]
+ n -= maxAsmSize
+ }
block(d, p[:n])
} else {
blockGeneric(d, p[:n])
diff --git a/src/crypto/md5/md5_test.go b/src/crypto/md5/md5_test.go
index 851e7fb10d..e120be3718 100644
--- a/src/crypto/md5/md5_test.go
+++ b/src/crypto/md5/md5_test.go
@@ -120,10 +120,11 @@ func TestGoldenMarshal(t *testing.T) {
func TestLarge(t *testing.T) {
const N = 10000
+ const offsets = 4
ok := "2bb571599a4180e1d542f76904adc3df" // md5sum of "0123456789" * 1000
- block := make([]byte, 10004)
+ block := make([]byte, N+offsets)
c := New()
- for offset := 0; offset < 4; offset++ {
+ for offset := 0; offset < offsets; offset++ {
for i := 0; i < N; i++ {
block[offset+i] = '0' + byte(i%10)
}
@@ -142,6 +143,31 @@ func TestLarge(t *testing.T) {
}
}
+func TestExtraLarge(t *testing.T) {
+ const N = 100000
+ const offsets = 4
+ ok := "13572e9e296cff52b79c52148313c3a5" // md5sum of "0123456789" * 10000
+ block := make([]byte, N+offsets)
+ c := New()
+ for offset := 0; offset < offsets; offset++ {
+ for i := 0; i < N; i++ {
+ block[offset+i] = '0' + byte(i%10)
...skipping...
- for offset := 0; offset < 4; offset++ {
+ for offset := 0; offset < offsets; offset++ {
for i := 0; i < N; i++ {
block[offset+i] = '0' + byte(i%10)
}
@@ -142,6 +143,31 @@ func TestLarge(t *testing.T) {
}
}
+func TestExtraLarge(t *testing.T) {
+ const N = 100000
+ const offsets = 4
+ ok := "13572e9e296cff52b79c52148313c3a5" // md5sum of "0123456789" * 10000
+ block := make([]byte, N+offsets)
+ c := New()
+ for offset := 0; offset < offsets; offset++ {
+ for i := 0; i < N; i++ {
+ block[offset+i] = '0' + byte(i%10)
+ }
+ for blockSize := 10; blockSize <= N; blockSize *= 10 {
+ blocks := N / blockSize
+ b := block[offset : offset+blockSize]
+ c.Reset()
+ for i := 0; i < blocks; i++ {
+ c.Write(b)
+ }
+ s := fmt.Sprintf("%x", c.Sum(nil))
+ if s != ok {
+ t.Fatalf("md5 TestExtraLarge offset=%d, blockSize=%d = %s want %s", offset, blockSize, s, ok)
+ }
+ }
+ }
+}
+
// Tests that blockGeneric (pure Go) and block (in assembly for amd64, 386, arm) match.
func TestBlockGeneric(t *testing.T) {
gen, asm := New().(*digest), New().(*digest)
diff --git a/src/crypto/sha256/sha256.go b/src/crypto/sha256/sha256.go
index 2deafbc9fc..567a3c81f9 100644
--- a/src/crypto/sha256/sha256.go
+++ b/src/crypto/sha256/sha256.go
@@ -28,6 +28,10 @@ const Size224 = 28
// The blocksize of SHA256 and SHA224 in bytes.
const BlockSize = 64
+// The maximum number of bytes that can be passed to block.
+const maxAsmIters = 1024
+const maxAsmSize = BlockSize * maxAsmIters // 64KiB
+
const (
chunk = 64
init0 = 0x6A09E667
@@ -191,6 +195,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
}
if len(p) >= chunk {
n := len(p) &^ (chunk - 1)
+ for n > maxAsmSize {
+ block(d, p[:maxAsmSize])
+ p = p[maxAsmSize:]
+ n -= maxAsmSize
+ }
block(d, p[:n])
p = p[n:]
} |
Assembly code is non-preemptible, and goroutines running such pre-emptible calls can delay stop-the-world GC pauses, impacting tail latency if they are timed poorly. Certain backup codepaths in Cockroach will spend a material amount of time encrypting large blocks of data. These calls involve non-preemptible calls. Mitigate the risk for these crypto libraries by bounding the work that any single call into an assembly routine performs. The impact of this change on the `BenchmarkLatencyWhileHashing` can be found here. In particular, for the SHA256 algorithm, the impact on latency was observed to improve by close to 2x, when encrypting larger block sizes. Inspiration taken from golang/go#/64417. Touches cockroachdb#115192. Release note: None.
cc @cockroachdb/disaster-recovery |
118605: build: add go runtime patch to bound non-preemptible work r=nvanbenschoten,rickystewart,rsevinsky-cr a=nicktrav There are two commits in this patch set, and are best reviewed individually. --- The first is a purely mechanical change to reapply our existing Go runtime patches to the upstream, and then port the diff back into Cockroach. This shuffles around the ordering of the various patches, making subsequent patching simpler. The second patch is cribbed from golang/go#64417, and is the material change, touching #115192. Epic: None. Co-authored-by: Nick Travers <travers@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Nick is OOO, so assigning to @sumeerbhola so it doesn't get lost. |
Assembly code is non-preemptible, and goroutines running such pre-emptible calls can delay stop-the-world GC pauses, impacting tail latency if they are timed poorly. Certain backup codepaths in Cockroach will spend a material amount of time encrypting large blocks of data. These calls involve non-preemptible calls. Mitigate the risk for these crypto libraries by bounding the work that any single call into an assembly routine performs. The impact of this change on the `BenchmarkLatencyWhileHashing` can be found here. In particular, for the SHA256 algorithm, the impact on latency was observed to improve by close to 2x, when encrypting larger block sizes. Inspiration taken from golang/go#/64417. Touches cockroachdb#115192. Release note: None.
It appears that due to golang/go#64417, we sometimes see long gc pause times and traces show STW pauses overlapping with block hashing. This has been observed for example during backups to s3, wherein the s3 sdk does both md5 and sha256 hashes of the 8mb chunks it buffered before uploading, but could affect other users of the hashing libraries as well.
We may want to consider patching the hashing functions on our go fork until the issue is resolved upstream.
Jira issue: CRDB-33925
The text was updated successfully, but these errors were encountered: