Add compressed writes to cas.go. #232

rubensf · 2020-11-06T23:18:46Z

This follows the current tentative API being worked on in
bazelbuild/remote-apis#168. While there's technically room for it to
change, it has reached a somewhat stable point worth implementing.

These are both pre-works for adding write compression support to the remote-apis-sdks. Chunker.Next should be "generic" as it allows for a drop in implementation of a reader that compresses on the fly. I still left the special case of caching data in memory as it saves the effort from memory copies. Making the chunker reads work independently from the digest size is also useful as it allows us to not have to pre-compute the digest of the compressed blobs. The current draft of the RE API never requires the digest of the compressed blob at any point, and this saves us the trouble of needing to read the data twice. Notice that this implies that the chunker won't be actually matching data against the supplied digest at any moment. The digest is now purely information storage, rather than necessary for chunker logic. As a caveat, for simplicity, I made it that we only cache in memory files that are smaller than the *chunk* size rather than the IO buffer size.

Follow up for #228.

This follows the current tentative API being worked on in bazelbuild/remote-apis#168. While there's technically room for it to change, it has reached a somewhat stable point worth implementing.

ola-rozenfeld · 2020-11-09T20:52:59Z

go/pkg/client/cas.go

@@ -135,7 +142,26 @@ func (c *Client) WriteProto(ctx context.Context, msg proto.Message) (digest.Dige
 func (c *Client) WriteBlob(ctx context.Context, blob []byte) (digest.Digest, error) {
 	ch := chunker.NewFromBlob(blob, int(c.ChunkMaxSize))
 	dg := ch.Digest()
-	return dg, c.WriteChunked(ctx, c.ResourceNameWrite(dg.Hash, dg.Size), ch)
+
+	name, err := c.maybeCompressBlob(ch)


I think it will be cleaner if you create the right type of chunker to begin with, instead of using the CompressChunker function. That way, you won't need the function at all (and probably not the .compressed variable either). It's simpler, because then the created chunker never has to change.

ola-rozenfeld · 2020-11-09T20:54:23Z

go/pkg/client/cas_test.go

@@ -284,7 +284,7 @@ func TestWrite(t *testing.T) {
 	}

 	for _, tc := range tests {
-		t.Run(tc.name, func(t *testing.T) {
+		testFunc := func(t *testing.T) {


Instead please add the new parameters to the tc struct.

ola-rozenfeld · 2020-11-09T21:01:30Z

go/pkg/client/cas.go

@@ -135,7 +142,26 @@ func (c *Client) WriteProto(ctx context.Context, msg proto.Message) (digest.Dige
 func (c *Client) WriteBlob(ctx context.Context, blob []byte) (digest.Digest, error) {
 	ch := chunker.NewFromBlob(blob, int(c.ChunkMaxSize))
 	dg := ch.Digest()
-	return dg, c.WriteChunked(ctx, c.ResourceNameWrite(dg.Hash, dg.Size), ch)
+


That is not enough to change if we want all writes to be compressed. Things don't usually go through this intermediate-layer function. Most use-cases use UploadIfMissing. I'd suggest creating a Client wrapper function for creating a Chunker from file (which initializes the correct type of chunker based on client parameters) and use it everywhere that Chunkers are created.

It follows the specs specified in bazelbuild/remote-apis#168, and it is similar to #232. Note that while the API still has room to change, it is mostly finalized and worth implementing. A caveat of this implementation is that while the `offset` in reads refers to the uncompressed bytes, the `limit` refers to the compressed bytes.

rubensf · 2020-11-23T17:09:22Z

Closing in favor of #240

It follows the specs specified in bazelbuild/remote-apis#168, and it is similar to #232. Note that while the API still has room to change, it is mostly finalized and worth implementing. A caveat of this implementation is that while the `offset` in reads refers to the uncompressed bytes, the `limit` refers to the compressed bytes.

rubensf added 4 commits November 3, 2020 16:39

Break reader in its own package.

663a52b

Change Seek -> SeekOffset so govet will stop complaining.

47c3bc8

Add compressed file reader.

0371e47

Follow up for #228.

rubensf requested a review from ola-rozenfeld November 6, 2020 23:18

google-cla bot added the cla: yes The author signed a CLA label Nov 6, 2020

rubensf force-pushed the write-compression branch from 6d54e76 to ece84c4 Compare November 9, 2020 17:36

Add compressed writes to cas.go.

13788e3

This follows the current tentative API being worked on in bazelbuild/remote-apis#168. While there's technically room for it to change, it has reached a somewhat stable point worth implementing.

rubensf force-pushed the write-compression branch from ece84c4 to 13788e3 Compare November 9, 2020 18:43

ola-rozenfeld reviewed Nov 9, 2020

View reviewed changes

rubensf force-pushed the compress-read branch from 0371e47 to 6c40fe7 Compare November 11, 2020 20:02

This was referenced Nov 12, 2020

Add compressed reads to cas.go. #237

Merged

Add compressed writes to cas.go. #240

Merged

rubensf closed this Nov 23, 2020

rubensf deleted the write-compression branch November 24, 2020 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compressed writes to cas.go. #232

Add compressed writes to cas.go. #232

rubensf commented Nov 6, 2020

ola-rozenfeld Nov 9, 2020

ola-rozenfeld Nov 9, 2020

ola-rozenfeld Nov 9, 2020

rubensf commented Nov 23, 2020

Add compressed writes to cas.go. #232

Add compressed writes to cas.go. #232

Conversation

rubensf commented Nov 6, 2020

ola-rozenfeld Nov 9, 2020

Choose a reason for hiding this comment

ola-rozenfeld Nov 9, 2020

Choose a reason for hiding this comment

ola-rozenfeld Nov 9, 2020

Choose a reason for hiding this comment

rubensf commented Nov 23, 2020