Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support extraction of unixfs content stored in car files #263

Merged
merged 6 commits into from
Nov 9, 2021
Merged

Conversation

willscott
Copy link
Member

@willscott willscott commented Nov 4, 2021

  • support symlinks
  • round-trip (create then extract) tests

@willscott willscott marked this pull request as ready for review November 5, 2021 12:40
@willscott willscott requested review from masih and mvdan November 5, 2021 12:40
Copy link
Member

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small blocker on error handling when making directory otherwise LGTM.
Left suggestions.

cmd/car/extract.go Outdated Show resolved Hide resolved
cmd/car/extract.go Outdated Show resolved Hide resolved
cmd/car/extract.go Show resolved Hide resolved
cmd/car/extract.go Outdated Show resolved Hide resolved
cmd/car/extract.go Outdated Show resolved Hide resolved
@rvagg
Copy link
Member

rvagg commented Nov 5, 2021

Naively writing out files with path.Join(outputDir, key) where key is just the field you find in a map seems like it might be inviting trouble. Do we do any checking in go-ipfs for whether this file path lands outside of the outputDir that we could copy across here? Even allowing .. or other directory altering characters is going to be a problem here and I'd be surprised if we don't already have code that checks for this in go-ipfs somewhere when creating files.

@willscott
Copy link
Member Author

Okay. There's better accounting to not get in trouble with symlinks or paths outside of the chosen directory now.

@willscott willscott requested a review from masih November 6, 2021 21:42
cmd/car/extract.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mostly curious if you intended for this code not to support UnixFS files as roots? It appears that way unless I'm missing something.


ls := cidlink.DefaultLinkSystem()
ls.TrustedStorage = true
ls.StorageReadOpener = func(_ ipld.LinkContext, l ipld.Link) (io.Reader, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphsync's storeutil.LinkSystemForBlockstore does this quite pleasantly -- sets trusted storage, adds a good storage read opener, even includes some optimizations for bytes.Buffer. We should really, really consider putting this in ipld-prime, or somewhere that's obviously accessable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a glue module for this coming in ipld/go-ipld-prime#279 (which I haven't merged yet, but don't actually know what I'm waiting for, either), so we could use that shortly.

cmd/car/car.go Outdated
{
Name: "extract",
Aliases: []string{"x"},
Usage: "Extract the contents of a car",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, it looks like if you give anything other than a car where the roots are DagPB/Raw, it will error. Perhaps it is worth putting a comment like this ("Extracts the contents of a car when the car encodes UnixFS data")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should think about other conventions that we should try to interpret, like hamts, structured data, etc.

}
pbnode := pbn.(dagpb.PBNode)

ufn, err := unixfsnode.Reify(ipld.LinkContext{}, pbnode, ls)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using a the Reify feature of LinkSystem for some reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in extractDir we look at the dagpb level data in understanding how and when to do the reification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have not entirely wrapped my head around all the code here, but it's possible that you could use NodeReifier's LinkContext parameter to pass down relevant hints, either via the .LinkPath or .LinkNode or .ParentNode properties. I'd think one of those should provide enough info for the decision making. Unless the relevant info is super nonlocal.

I have not analyzed if that would make this code easier to reason about or reuse; just offering the possibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be a new coordination with unixfsnode somehow that seems harder to reason about though. I think this is a reasonable setup for applying the reification explicitly when needed.

return nil
}

pbn, err := ls.Load(ipld.LinkContext{}, cidlink.Link{Cid: root}, dagpb.Type.PBNode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems to assume that if you don't have a raw node, you have a root that is a UnixFS Directory, not a regular file (even though extractFile exists as a function). Is this the intent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is generally the case. if i export a single file it'll be implicitly in a directory in order to provide the file name of the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, to compare to other unix stuff, tar commands do allow single file no-dir contents, iirc. Rare and a bit arcane, but possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay for that to go in a follow-up?

@willscott willscott requested a review from masih November 8, 2021 19:50
Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to address the concern I had about root escaping, so all paths should strictly only come under the requested output directory

@willscott willscott merged commit 6d94b7b into master Nov 9, 2021
@willscott willscott deleted the extract branch November 9, 2021 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants