Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate caching of compressed data from SpanManager to BlobReader #232

Open
vkuzniet opened this issue Dec 13, 2022 · 3 comments
Open

Migrate caching of compressed data from SpanManager to BlobReader #232

vkuzniet opened this issue Dec 13, 2022 · 3 comments
Labels
feature New feature or request

Comments

@vkuzniet
Copy link
Contributor

vkuzniet commented Dec 13, 2022

Is your feature request related to a problem? Please describe.
Once #231 is complete, SpanManager needs to stop having the compressed spans cache and keep caching only uncompressed spans going forward. Compressed spans should be cached within BlobReader. Please refer to the design doc from #103 for more details on the concepts.
In short, we need to do the following:

  1. BlobReader now has the cache of spans (aka regions).
  2. SpanManager keeps only the cache of uncompressed spans. If it needs the compressed span, it requests it from BlobReader.

Describe the solution you'd like
The outcome for this issue is a design for the separation.

Describe alternatives you've considered
Keep everything as is.

Additional context
#103

@vkuzniet vkuzniet added the feature New feature or request label Dec 13, 2022
@vkuzniet vkuzniet moved this to ❓ Ungroomed in soci-snapshotter Dec 13, 2022
@Kern-- Kern-- changed the title Refactor the data path for soci snapshotter: part 2 Move caching of compressed data from SpanManager to BlobReader Dec 13, 2022
@Kern-- Kern-- changed the title Move caching of compressed data from SpanManager to BlobReader Migrate caching of compressed data from SpanManager to BlobReader Dec 13, 2022
@Kern--
Copy link
Contributor

Kern-- commented Dec 15, 2022

This needs additional design work. I will update this comment with more specific information later.

@Kern--
Copy link
Contributor

Kern-- commented Mar 29, 2023

While we're doing this work, we should be thinking about the ability to decouple the actual network requests from the span size. A span is really a unit of decompression, but we also use it as a unit of data to be fetched from the network. We can probably get some better performance if we optimize our network requests separately from the decompression. E.g. S3 likes 8 or 16MiB requests, but we might want 2 or 4MiB spans to reduce necessary compute.

We don't actually have to solve this problem in the initial separation, but we shouldn't design it in such a way that we can't do that later.

@Kern-- Kern-- moved this from ❓ Ungroomed to 📋 Backlog in soci-snapshotter Apr 3, 2023
@Kern--
Copy link
Contributor

Kern-- commented Apr 3, 2023

TODO: Merge with #534

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants