-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for chunking of blobs, using SHA256TREE #233
base: main
Are you sure you want to change the base?
Add support for chunking of blobs, using SHA256TREE #233
Conversation
ba20afd
to
a53b775
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a promising way to get this feature into v2 instead of waiting for v3. Great work!
// The size of the non-trailing blobs to create. It must be less than | ||
// the sizes stored in `digests`. Furthermore, for BLAKE3CONCAT must | ||
// be a power of 2. | ||
int64 split_size_bytes = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a recommended size to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. If the client wants really granular access to the object with as much deduplication as possible, it can use blake3concat_min_split_size_bytes
. If it simply wants to download the object in its entirety without caring too much about deduplication, it can use the highest power of two that does not exceed max_batch_total_size
.
// The minimum size of blobs that can be created by calling | ||
// [SplitBlobs][build.bazel.remote.execution.v2.ContentAddressableStorage.SplitBlobs] | ||
// against a blob that uses the BLAKE3CONCAT digest function, | ||
// disregarding the blob containing trailing data. | ||
// | ||
// If supported, this field MUST have value 2^k, where k > 10. It may | ||
// not exceed `max_batch_total_size_bytes`. | ||
int32 blake3concat_min_split_size_bytes = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the server be able to have an efficient implementation if the API is called with different chunk sizes? Would there be an "optimal" value for a static chunk size to reduce the risk of inefficient client-server combinations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in the case of Buildbarn I will just set blake3concat_max_upload_size_bytes == blake3concat_min_split_size_bytes
. Then all clients/servers will be in full agreement on what the chunk size is.
The reason I kept them apart, is because I tried to keep the upload & download paths separated. I can imagine that if someone operating a cluster discovers that they made a bad choice regarding chunk size, that having these separate will make it easier to gradually migrate from one chunk size to the other if needed.
With regards to a one size fits all optimal chunk size, I'm not sure whether such a value exists. I guess it really depends on bandwidth vs. latency. If you're bandwidth constrained, then smaller chunks is better. Conversely, if latency is high, it may be desirable to set the chunk size higher, so that you need to call into ConcatenateBlobs() and SplitBlobs() less frequently.
677b4e8
to
b8c052f
Compare
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named BLAKE3CONCAT. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the BLAKE3CONCAT digest function. I have derived these by modifying the BLAKE3 reference implementation written in Rust, and rerunning the tool that emits the official test vectors: https://github.com/BLAKE3-team/BLAKE3/blob/master/test_vectors/test_vectors.json Furthermore, I have been able to validate the newly obtained test vectors using a custom BLAKE3CONCAT implementation that I have written in Go, which will become part of Buildbarn.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named BLAKE3CONCAT. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the BLAKE3CONCAT digest function. I have derived these by modifying the BLAKE3 reference implementation written in Rust, and rerunning the tool that emits the official test vectors: https://github.com/BLAKE3-team/BLAKE3/blob/master/test_vectors/test_vectors.json Furthermore, I have been able to validate the newly obtained test vectors using a custom BLAKE3CONCAT implementation that I have written in Go, which will become part of Buildbarn.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named BLAKE3CONCAT. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the BLAKE3CONCAT digest function. I have derived these by modifying the BLAKE3 reference implementation written in Rust, and rerunning the tool that emits the official test vectors: https://github.com/BLAKE3-team/BLAKE3/blob/master/test_vectors/test_vectors.json Furthermore, I have been able to validate the newly obtained test vectors using a custom BLAKE3CONCAT implementation that I have written in Go, which will become part of Buildbarn.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv6 cryptography extensions. All three versions behave identically.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv6 cryptography extensions. All three versions behave identically.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv6 cryptography extensions. All three versions behave identically.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv8 cryptography extensions. All three versions behave identically.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv8 cryptography extensions. All three versions behave identically.
b8c052f
to
0f05b90
Compare
As #235 and #236 are in my opinion close to a state in which they can be merged, I have gone ahead and reimplemented this PR on top of #235. Changes to the previous version are as follows:
PTAL, ignoring the first commit in this PR. That one is part of #235. |
4c54573
to
b659e33
Compare
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv8 cryptography extensions. All three versions behave identically.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv8 cryptography extensions. All three versions behave identically.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv8 cryptography extensions. All three versions behave identically.
b659e33
to
990d387
Compare
Buildbarn has invested heavily in using virtual file systems. Both on the worker and client side it's possible to lazily fault in data from the CAS. As Buildbarn implements checksum verification where needed, randomly accessing large files may be slow. To address this, this change adds support for composing and decomposing CAS objects, using newly added ConcatenateBlobs() and SplitBlobs() operations. If implemented naively (e.g., using SHA-256), these operations would not be verifiable. To rephrase: when merely given the checksum of smaller objects, there is no way to obtain that of its concatenated version. This is why we suggest that these operations are only used in combination with SHA256TREE (see bazelbuild#235). With these new operations present, there is no true need to use the Bytestream protocol any longer. Writes can be performed by uploading smaller parts through BatchUpdateBlobs(), followed by calling ConcatenateBlobs(). Conversely, reads of large objects can be performed by calling SplitBlobs() and downloading individual parts through BatchReadBlobs(). For compatibility, we still permit the Bytestream protocol to be used. This is a decision we can revisit in REv3.
990d387
to
d921f6f
Compare
I think this idea is neat! I'm excited about combining this with the idea of transmitting variable size chunks. I'd like to propose a slight modification: instead of Separately, what do you think about returning a ConcatenateBlobsResponse from the ConcatenateBlobs call? Though the response matches BatchUpdateBlobsResponse today, it may not in the future, and it's much easier to change this now before clients are using it. Finally, would it make sense to remove the chunk size in SplitBlobsRequest and let the server decide this value? |
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv8 cryptography extensions. All three versions behave identically.
In PR bazelbuild#233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv8 cryptography extensions. All three versions behave identically.
In PR #233 I proposed the addition of two new ContentAddressableStorage methods (ConcatenateBlobs and SplitBlobs) that allow one to gain random access it large CAS objects, while still providing a way to very data integrity. As part of that change, I added a new digest function to help with that, named SHA256TREE. This PR adds just this digest function, without bringing in any support for chunking. This will be done separately, as it was requested that both these features landed independently. I have also included test vectors for the SHA256TREE digest function. I have derived these by implementing three different versions in the Go programming language: - One version that uses regular arithmetic in Go. - One version for x86-64 that uses AVX2. - One version for ARM64 that uses the ARMv8 cryptography extensions. All three versions behave identically.
Buildbarn has invested heavily in using virtual file systems. Both on the worker and client side it's possible to lazily fault in data from the CAS. As Buildbarn implements checksum verification where needed, randomly accessing large files may be slow. To address this, this change adds support for composing and decomposing CAS objects, using newly added ConcatenateBlobs() and SplitBlobs() operations.
If implemented naively (e.g., using SHA-256), these operations would not be verifiable. To rephrase: when merely given the checksum of smaller objects, there is no way to obtain that of its concatenated version. This is why we suggest that these operations are only used in combination with SHA256TREE (see #235).
With these new operations present, there is no true need to use the Bytestream protocol any longer. Writes can be performed by uploading smaller parts through BatchUpdateBlobs(), followed by calling ConcatenateBlobs(). Conversely, reads of large objects can be performed by calling SplitBlobs() and downloading individual parts through BatchReadBlobs(). For compatibility, we still permit the Bytestream protocol to be used. This is a decision we can revisit in REv3.
Fixes: #178