Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deliver/Expect singly compressed content with a single ByteStream::Write/Read #16808

Open
3 tasks
werkt opened this issue Nov 20, 2022 · 1 comment
Open
3 tasks
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@werkt
Copy link
Contributor

werkt commented Nov 20, 2022

Description of the feature request:

So I got up close and personal with this implementation when I added zstd support to buildfarm, per the supplied streams in the zstd package, and I feel that #14041 needs to be reworked. This will also need the consensus of the rest of @bazelbuild/remote-execution, since the implications here are a compatibility break with existing implementations (but not a break with persistent data), perhaps mitigable through version/feature specification.

The overall stream is not being compressed for transport in a tidily shorter sequence of upload/download chunks. Instead, each chunk is being compressed and sent as a shorter WriteRequest/ReadResponse. This results in drastically poorer overall byte ratio of identity:compressed compared to a simple zstd invocation on blob content, with a higher possibility of blob inflation due to the reduced context and overhead of the smaller blocks.

This was discovered when the upload went through buildfarm's rechunking mechanism for shards, where request chunks would be combined or split on 16k boundaries by default, and the stream processor only worked on the current chunk, discarding appended bytes, and eliciting decompression corruption.

Overall, this implementation is in conflict with the zstd compressor description in the remote apis, which makes no mention of sequential compressor streams in compressed packed blob content, but instead expects the full content to be deliverable to a compressor engine in a single context.

This issue tracks a correction to the compressor which creates a single zstd compressed blob, chunked into even block sizes for upload and no per-request/response semantics for the stream.

  • Draft/Petition for the singly-compressed ByteStream content language to be added to https://github.com/bazelbuild/remote-apis
  • ByteStream::Write chunking processes a fully compressed stream
  • ByteStream::Read data reception expects a fully compressed stream

What underlying problem are you trying to solve with this feature?

Bazel is not producing strictly remote-apis adherent client implementation with --experimental_remote_cache_compression, resulting in

  • undue interpretation complexity (peers must understand this is bazel's approach)
  • objectively worse performance for a line-cost reduction optimization (because partial compressor context limits compressibility)
  • byte stream structure imposition (interfering with arbitratry stream reslicing)

Which operating system are you running Bazel on?

linux

What is the output of bazel info release?

5.3.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

https://github.com/bazelbuild/remote-apis/blob/7d1354eef67545561bf689acdaabb41a99d98584/build/bazel/remote/execution/v2/remote_execution.proto#L241-L248
(Note that this unfortunately does not include specific language about how a ByteStream's compressed content should represent a single expansion via the compressor specified. I will also seek to add this language)
buildfarm/buildfarm#1211

Any other information, logs, or outputs that you want to share?

No response

@werkt werkt self-assigned this Nov 20, 2022
@sgowroji sgowroji added team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request untriaged labels Nov 21, 2022
@werkt
Copy link
Contributor Author

werkt commented Nov 21, 2022

This may not be quite so cut-and-dry - some of the corruption observed was undoubtedly caused by presenting a compressed committedSize via queries, which bazel uses to seek the input stream.

There is however definitely a disconnect between the typical behavior of zstd and the written bytes for an upload with compression: A 100MiB zero file results in 3307 bytes with zstd commandline, and an agreeing count for Zstd.compress, but results in 854257 bytes sent via 53 write requests. I'll seek out the source of this discrepancy as a part of this issue.

@coeuvre coeuvre self-assigned this Nov 22, 2022
@coeuvre coeuvre added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants