Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up tar header handling #2

Open
allisonkarlitskaya opened this issue Oct 10, 2024 · 3 comments
Open

Clean up tar header handling #2

allisonkarlitskaya opened this issue Oct 10, 2024 · 3 comments
Assignees

Comments

@allisonkarlitskaya
Copy link
Collaborator

One of the following needs to happen:

  • address the remaining 'todo' items in our homegrown tar header merging code (ie: properly handling of pax long names, possibly others)
  • figuring out a way to use the entry parsing from the tar crate together with SplitStreamReader. This could maybe work by:
    • implementing Read + Seek on SplitStreamReader and figure out a very clever way to avoid reading the file content of external references by using seek(). I'm not sure how we access the underlying SplitStreamReader object to check if we're at inline content or an external reference, though, without running afoul of the borrow checker...
    • parse one inline chunk at a time (using the Read implementation of &[u8]), knowing that we'd hit an unexpected EOF if we tried to read the file data. This is probably doable, but kinda ugly.
@allisonkarlitskaya
Copy link
Collaborator Author

Our home-brew header handling code is half decent at this point...

@allisonkarlitskaya
Copy link
Collaborator Author

Another idea for how we could maybe force the tar crate to do our bidding: we could try to return an error from our read() implementation that contained the object reference. I'm not sure if it's possible to encode that in an std::io::Result, though? Those seem kinda like they're basically an errno...

@allisonkarlitskaya
Copy link
Collaborator Author

Another thing to consider here is performance: our current approach is forced to allocate the path so that it can return a TarEntry (removing itself from the stack in the process).

If we did a callback instead of returning, then we could serve pointers directly out of the tar header. If we wanted to get very fancy we could take advantage of the fact that we only ever process one splitstream inline chunk at a time and also store slice pointers to the xattr and pax longname data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants