Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compressed writes to cas.go. #232

Closed
wants to merge 5 commits into from
Closed

Conversation

rubensf
Copy link
Contributor

@rubensf rubensf commented Nov 6, 2020

This follows the current tentative API being worked on in
bazelbuild/remote-apis#168. While there's technically room for it to
change, it has reached a somewhat stable point worth implementing.

These are both pre-works for adding write compression support to the
remote-apis-sdks.

Chunker.Next should be "generic" as it allows for a drop in implementation
of a reader that compresses on the fly. I still left the special case of
caching data in memory as it saves the effort from memory copies.

Making the chunker reads work independently from the digest size is also
useful as it allows us to not have to pre-compute the digest of the
compressed blobs. The current draft of the RE API never requires the
digest of the compressed blob at any point, and this saves us the
trouble of needing to read the data twice.

Notice that this implies that the chunker won't be actually matching
data against the supplied digest at any moment. The digest is now purely
information storage, rather than necessary for chunker logic.

As a caveat, for simplicity, I made it that we only cache in memory
files that are smaller than the *chunk* size rather than the IO buffer
size.
@rubensf rubensf requested a review from ola-rozenfeld November 6, 2020 23:18
@google-cla google-cla bot added the cla: yes The author signed a CLA label Nov 6, 2020
This follows the current tentative API being worked on in
bazelbuild/remote-apis#168. While there's technically room for it to
change, it has reached a somewhat stable point worth implementing.
@@ -135,7 +142,26 @@ func (c *Client) WriteProto(ctx context.Context, msg proto.Message) (digest.Dige
func (c *Client) WriteBlob(ctx context.Context, blob []byte) (digest.Digest, error) {
ch := chunker.NewFromBlob(blob, int(c.ChunkMaxSize))
dg := ch.Digest()
return dg, c.WriteChunked(ctx, c.ResourceNameWrite(dg.Hash, dg.Size), ch)

name, err := c.maybeCompressBlob(ch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be cleaner if you create the right type of chunker to begin with, instead of using the CompressChunker function. That way, you won't need the function at all (and probably not the .compressed variable either). It's simpler, because then the created chunker never has to change.

@@ -284,7 +284,7 @@ func TestWrite(t *testing.T) {
}

for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
testFunc := func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead please add the new parameters to the tc struct.

@@ -135,7 +142,26 @@ func (c *Client) WriteProto(ctx context.Context, msg proto.Message) (digest.Dige
func (c *Client) WriteBlob(ctx context.Context, blob []byte) (digest.Digest, error) {
ch := chunker.NewFromBlob(blob, int(c.ChunkMaxSize))
dg := ch.Digest()
return dg, c.WriteChunked(ctx, c.ResourceNameWrite(dg.Hash, dg.Size), ch)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not enough to change if we want all writes to be compressed. Things don't usually go through this intermediate-layer function. Most use-cases use UploadIfMissing. I'd suggest creating a Client wrapper function for creating a Chunker from file (which initializes the correct type of chunker based on client parameters) and use it everywhere that Chunkers are created.

rubensf added a commit that referenced this pull request Nov 11, 2020
It follows the specs specified in bazelbuild/remote-apis#168, and
it is similar to #232. Note that while the API still has room to
change, it is mostly finalized and worth implementing.

A caveat of this implementation is that while the `offset` in reads
refers to the uncompressed bytes, the `limit` refers to the compressed
bytes.
rubensf added a commit that referenced this pull request Nov 12, 2020
It follows the specs specified in bazelbuild/remote-apis#168, and
it is similar to #232. Note that while the API still has room to
change, it is mostly finalized and worth implementing.

A caveat of this implementation is that while the `offset` in reads
refers to the uncompressed bytes, the `limit` refers to the compressed
bytes.
rubensf added a commit that referenced this pull request Nov 12, 2020
It follows the specs specified in bazelbuild/remote-apis#168, and
it is similar to #232. Note that while the API still has room to
change, it is mostly finalized and worth implementing.

A caveat of this implementation is that while the `offset` in reads
refers to the uncompressed bytes, the `limit` refers to the compressed
bytes.
rubensf added a commit that referenced this pull request Nov 23, 2020
It follows the specs specified in bazelbuild/remote-apis#168, and
it is similar to #232. Note that while the API still has room to
change, it is mostly finalized and worth implementing.

A caveat of this implementation is that while the `offset` in reads
refers to the uncompressed bytes, the `limit` refers to the compressed
bytes.
@rubensf
Copy link
Contributor Author

rubensf commented Nov 23, 2020

Closing in favor of #240

@rubensf rubensf closed this Nov 23, 2020
rubensf added a commit that referenced this pull request Nov 24, 2020
It follows the specs specified in bazelbuild/remote-apis#168, and
it is similar to #232. Note that while the API still has room to
change, it is mostly finalized and worth implementing.

A caveat of this implementation is that while the `offset` in reads
refers to the uncompressed bytes, the `limit` refers to the compressed
bytes.
@rubensf rubensf deleted the write-compression branch November 24, 2020 20:16
rubensf added a commit that referenced this pull request Nov 24, 2020
It follows the specs specified in bazelbuild/remote-apis#168, and
it is similar to #232. Note that while the API still has room to
change, it is mostly finalized and worth implementing.

A caveat of this implementation is that while the `offset` in reads
refers to the uncompressed bytes, the `limit` refers to the compressed
bytes.
rubensf added a commit that referenced this pull request Nov 25, 2020
It follows the specs specified in bazelbuild/remote-apis#168, and
it is similar to #232. Note that while the API still has room to
change, it is mostly finalized and worth implementing.

A caveat of this implementation is that while the `offset` in reads
refers to the uncompressed bytes, the `limit` refers to the compressed
bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes The author signed a CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants