Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seekable tarfs #669

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Seekable tarfs #669

wants to merge 5 commits into from

Conversation

hdonnay
Copy link
Member

@hdonnay hdonnay commented Aug 22, 2022

This adds support for variants of layers that don't need a whole layer decompressed.

TODO:

  • Back out 1.19-new API
  • Add ReaderAt HTTP implementation
  • Back out httpreader buffering

@hdonnay
Copy link
Member Author

hdonnay commented Aug 22, 2022

cc @giuseppe This zstd:chunked implementation is largely based on reverse-engineering the code in containers/storage. Is there more formal documentation or test fixtures that I can cross-reference against?

@hdonnay
Copy link
Member Author

hdonnay commented Aug 22, 2022

Note to self: using linux's non-POSIX record locks and a file-reopening scheme would allow the fs.SubFS interface to be implemented directly, but would require some thought about how that interacts with a needed Close call, which doesn't exist on the returned fs.FS.

@giuseppe
Copy link

cc @giuseppe This zstd:chunked implementation is largely based on reverse-engineering the code in containers/storage. Is there more formal documentation or test fixtures that I can cross-reference against?

@hdonnay, the zstd:chunked is left undocumented on purpose at the moment so that I can still break the format if needed :-)

Some high-level information is present in the original PR that added support for it: containers/storage#775

Out of curiosity: how are you going to use these formats from Claire?

@hdonnay
Copy link
Member Author

hdonnay commented Aug 24, 2022

ah, okay. Well I feel like I've got a handle on it, so I can help with the documentation when the time comes.

Clair can use this to reduce disk utilization at the cost of latency. This would be helpful for large layers where contents are full of content that our indexers don't care about.

@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 2 times, most recently from 4e13bb1 to 71c74b0 Compare October 4, 2022 21:07
@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 3 times, most recently from ab7f17d to 6299c8c Compare December 14, 2022 16:48
@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 2 times, most recently from a0d6d5b to ed45eb8 Compare January 18, 2023 18:09
@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 2 times, most recently from 27e7e43 to 0deabc8 Compare February 1, 2023 16:50
@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 2 times, most recently from cb576e9 to ab5bd06 Compare February 6, 2023 21:41
@hdonnay hdonnay marked this pull request as ready for review February 6, 2023 21:55
@hdonnay hdonnay requested a review from a team as a code owner February 6, 2023 21:55
@hdonnay hdonnay requested review from crozzy and removed request for a team February 6, 2023 21:55
@hdonnay
Copy link
Member Author

hdonnay commented Feb 6, 2023

Decided to punt any httpreader usage to a future PR.

The tricky things for that future PR will be:

  • New fetcher implementation
  • New Indexer interfaces (either on the indexers themselves or on the Layer, somehow) to support using a single backing fs.FS
  • Fixing current users of (*Layer).Reader and (*Layer).SetLocal

@hdonnay
Copy link
Member Author

hdonnay commented Feb 6, 2023

This is also going to be easier to review commit-wise, sorry.

@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 2 times, most recently from ea54656 to 375cb61 Compare March 7, 2023 15:36
@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 2 times, most recently from cbbf4be to 6329f11 Compare April 14, 2023 21:40
@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 2 times, most recently from 4ed05e3 to 32ee97e Compare September 7, 2023 22:00
@codecov
Copy link

codecov bot commented Sep 7, 2023

Codecov Report

Attention: 273 lines in your changes are missing coverage. Please review.

Files Coverage Δ
layer.go 70.54% <100.00%> (+2.27%) ⬆️
pkg/tarfs/tarfs.go 86.66% <100.00%> (+12.85%) ⬆️
pkg/tarfs/metrics.go 84.61% <84.61%> (ø)
pkg/tarfs/pool.go 84.00% <84.00%> (ø)
pkg/tarfs/file.go 81.08% <83.33%> (-7.50%) ⬇️
pkg/tarfs/randomaccess.go 58.82% <58.82%> (ø)
pkg/tarfs/fs.go 68.09% <68.09%> (ø)
pkg/tarfs/parse.go 53.25% <57.79%> (-0.95%) ⬇️
pkg/tarfs/srv.go 74.80% <74.80%> (ø)

... and 3 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@hdonnay
Copy link
Member Author

hdonnay commented Oct 9, 2023

Decided to punt any httpreader usage to a future PR.

The tricky things for that future PR will be:

* New fetcher implementation

* New Indexer interfaces (either on the indexers themselves or on the Layer, somehow) to support using a single backing `fs.FS`

* Fixing current users of `(*Layer).Reader` and `(*Layer).SetLocal`

The implementation in #1061 addresses most of these structural problems.

@hdonnay hdonnay marked this pull request as draft October 9, 2023 22:16
@hdonnay
Copy link
Member Author

hdonnay commented Oct 9, 2023

Going to re-draft this until after #1061 and the subsequent Scanner touching to be able to avoid all of the call-side rewrites that are currently in this PR.

@hdonnay hdonnay force-pushed the hack/tarfs-seekable branch 7 times, most recently from e9554b8 to ae4bd91 Compare October 26, 2023 17:28
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
With the `http.DefaultClient` now being poisoned, there's no "good"
http.Client for a test to have. This change removes the argument and
uses a single, package-internal client.

Test-Fail: OK
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Test-Fail: OK
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
Signed-off-by: Hank Donnay <hdonnay@redhat.com>
This change adds support for some schemes of encoding a table of
contents and allowing individual files to be accessed independently.
It breaks the API by adding a "size" parameter (like the archive/zip
package) but attempts to autodetect when given a nonsense value.

Signed-off-by: Hank Donnay <hdonnay@redhat.com>
@hdonnay hdonnay marked this pull request as ready for review October 26, 2023 18:53
@hdonnay
Copy link
Member Author

hdonnay commented Oct 26, 2023

#1061 and #1105 addressed the ergonomics of this API change, so I think this is good to get reviewed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants