Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud/s3: skip md5 hashing #115189

Closed
Tracked by #115190
dt opened this issue Nov 28, 2023 · 3 comments · Fixed by #115160, #115197 or #115198
Closed
Tracked by #115190

cloud/s3: skip md5 hashing #115189

dt opened this issue Nov 28, 2023 · 3 comments · Fixed by #115160, #115197 or #115198
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-0 Issues/test failures with a fix SLA of 2 weeks T-disaster-recovery

Comments

@dt
Copy link
Member

dt commented Nov 28, 2023

Part of #115190

We have observed that backups to s3 cause increased tail latencies in foreground traffic, sometimes significantly.

The S3 SDK hashes chunk sized (currently 8mb) blocks both with MD5 and SHA256, for content checksum and signing respectively. It appears that due in large part to golang/go#64417, this causes us to observe long gc pause times and traces show STW pauses overlapping with block hashing.

We can disable the MD5 hashing as it is also done by amazon server side, and thus exists only to verify the upload integrity, but this is already verified by uploading over TLS, while additionally most of the files we upload already have internal or other hashes included in their content, such as SST block checksums.

Jira issue: CRDB-33923

@dt dt added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery T-disaster-recovery GA-blocker labels Nov 28, 2023
@dt dt self-assigned this Nov 28, 2023
Copy link

blathers-crl bot commented Nov 28, 2023

cc @cockroachdb/disaster-recovery

Copy link

blathers-crl bot commented Nov 28, 2023

Hi @dt, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@dt dt added the branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 label Nov 28, 2023
@dt dt changed the title backup: tail latencies higher when backing up to s3 due to MD5 hash s3: skip md5 hashing Nov 28, 2023
@dt dt changed the title s3: skip md5 hashing cloud/s3: skip md5 hashing Nov 28, 2023
@dt dt added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Nov 28, 2023
@dt dt removed GA-blocker branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 labels Nov 29, 2023
@benbardin benbardin added the P-0 Issues/test failures with a fix SLA of 2 weeks label Nov 29, 2023
@dt
Copy link
Member Author

dt commented Nov 29, 2023

All backports merged.

@dt dt closed this as completed Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-0 Issues/test failures with a fix SLA of 2 weeks T-disaster-recovery
Projects
Archived in project
2 participants