Skip to content
This repository has been archived by the owner on Nov 7, 2024. It is now read-only.

lib/tar: Pre-filter tar archive in Rust #112

Merged
merged 1 commit into from
Oct 5, 2021

Conversation

cgwalters
Copy link
Member

Rather than trying to extend the ostree C code to support
arbitrary transformations, let's do pre-parsing here in safe Rust.

We keep the /etc/usr/etc bits, but we also just completely
drop everything not in /usr now.

As noted in the comment, this pre-validation will hopefully also
catch any corrupt tarballs that might be exploitable in the C libarchive
codebase.


// Completely discard stuff outside of /usr
if !(matches!(path.as_str(), "/" | "/usr") || path.starts_with("usr/")) {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd throw a warn!() in here. I'm pretty sure we'll discover a good bunch of nasty things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, though since this is a library we should probably allow the caller to collect this data and log/print it however it wants.

Will look at that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the bigger picture I am currently thinking of this as a shortcut - longer term we'll strongly encourage the model in #12 (comment) where we have a required RUN ostree container-finalize-bootable incantation that will clean this stuff up at build time.

Rather than trying to extend the ostree C code to support
arbitrary transformations, let's do pre-parsing here in safe Rust.

We keep the `/etc` → `/usr/etc` bits, but we also just completely
drop everything not in `/usr` now.

As noted in the comment, this pre-validation will hopefully also
catch any corrupt tarballs that might be exploitable in the C libarchive
codebase.
@cgwalters
Copy link
Member Author

OK, that was a fun deep dive into path libraries in Rust, specifically the components method.

We now parse, normalize and filter the input path in a much more robust way, and this updated code now also returns a btreemap from "prefix -> number of filtered paths" which should be sufficient to start.

@@ -26,6 +26,35 @@ impl<T: AsyncRead> ReadBridge<T> {
}
}

/// A [`std::io::Write`] implementation backed by an asynchronous source.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I submitted tokio-rs/tokio#4146

@cgwalters
Copy link
Member Author

Hmm, what we should really do here actually is store the metadata about filtered paths in the commit object. However, doing that would require splitting up the write/commit phases, which in turn really wants us to be doing all of this natively in Rust instead of forking off ostree commit. Which in turn depends on ostreedev/ostree#2449 etc. So...later.

@cgwalters
Copy link
Member Author

Any other comments on this, or good to go?

@lucab
Copy link
Member

lucab commented Oct 5, 2021

All good from my side 👍.

As (very subjective) feedback on your last comment, I don't personally feel the need to record that information persistently. It can be interesting to log and inspect it at runtime, but in the steady state I guess it's unlikely that we'd want to parse/analyze that from the commit object itself.

@cgwalters cgwalters merged commit 8bd5a77 into ostreedev:main Oct 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants