Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack unpack: Ignore pax headers (fix #2361) #2363

Merged

Conversation

Blaisorblade
Copy link
Collaborator

@Blaisorblade Blaisorblade commented Jul 10, 2016

Stop trying to reset permissions on pax header entries. This fixed the bug as verified by stack exec -- stack unpack ghc-core and stack exec -- stack install ghc-core. Haven't tested on other packages or on the testsuite (I'm relying on CI).

XXX This is incomplete, see XXX case in the patch. I'm not sure how defensive to be for the case that some tar entry is not one of the known entry types. Also, I've not checked other tar formats than pax, so maybe there's more to expect.

Quite possibly, the code should just have a catchall case for all cases leading to False or at least for all Tar.OtherEntryType entries, and then I can dispense with all these complications.

Fix #2361.

@borsboom
Copy link
Contributor

This looks pretty good. I'm honestly not sure what the best way to handle the other case; maybe a warning would be the safest just so there are no surprises. Another idea would be to always try to set the permissions, just like before this patch, but treat failure asa warning for the weird/unknown cases (i.e., still an error if it fails for a regular file, directory, etc. but just a warning for character devices, OtherEntryType, etc.).

@Blaisorblade
Copy link
Collaborator Author

Thanks for the prompt response!

treat failure as a warning for the weird/unknown cases (i.e., still an error if it fails for a regular file, directory, etc. but just a warning for character devices, OtherEntryType, etc.).

That could work. But FWIW, those don't even get extracted by Tar.extract, so we risk too many warnings.

Orthogonally: GNU archives can use other flags. Since none correspond to files or are handled by Tar.extract, I'd say I'm fine with ignoring entries of those types, or with just ignoring any OtherEntryType.
https://www.gnu.org/software/tar/manual/html_section/tar_91.html#SEC187

@sjakobi
Copy link
Member

sjakobi commented Jul 14, 2016

Thanks for working on this, @Blaisorblade! :)

I think I'd like to get a warning if the tarball contains a named pipe, and maybe the author would like to know too. So I'm in favor of extracting what we can reasonably use – just like what you have already implemented – and a warning about all the things that we can't.

And could you please add a regression test that checks that the warning is present and that the unpacking works?! Cheers!

@Blaisorblade
Copy link
Collaborator Author

First down, the warnings (output yet untested). Using foldEntries is surprisingly ugly; I've tried converting to and from a list but it might not be better :-(. Next up, tests!

@Blaisorblade Blaisorblade force-pushed the 2361-ignore-pax-header-entries branch from 775df73 to 88776f7 Compare July 18, 2016 11:51
@Blaisorblade
Copy link
Collaborator Author

So, since I like unit tests, I ended up with some refactorings and harmless changes to make this code unit-testable. I also haven't seen unit tests with data, so I had to get creative there.
I've run normal tests locally, for integration tests I'm relying on CI.

@Blaisorblade
Copy link
Collaborator Author

I don't get the AppVeyor failure, and it doesn't seem about my testcase. If AppVeyor is flaky like Travis, could you restart the AppVeyor build? There is enough path manipulation that it matters that it passes.

There's no clear indication it's about my testcase, and the actual code in main should still do essentially the same thing (except for 98e2ec0).

Also master already broke in https://ci.appveyor.com/project/snoyberg/stack/build/1.0.1867, and
https://ci.appveyor.com/project/snoyberg/stack/build/1.0.1868#L10 builds on that. That might be irrelevant though, since that's just a timeout.

@sjakobi
Copy link
Member

sjakobi commented Jul 18, 2016

could you restart the AppVeyor build?

No, I don't have the necessary permissions.

Pinging @snoyberg.

@snoyberg
Copy link
Contributor

Rebuilt. In the future, doing git commit --amend && git push -f also works well.

@Blaisorblade
Copy link
Collaborator Author

It failed again — I started investigating the failure.

@sjakobi
Copy link
Member

sjakobi commented Jul 19, 2016

Oh man, these Appveyor log files are absolutely unreadable.

[00:18:04] 016-07-18 14:46:38.490247: [debug] 
[00:18:04] Try
[00:18:04] ing 
[00:18:04] t
[00:18:04] o
[00:18:04]  d
[00:18:04] e
[00:18:04] c
[00:18:04] o
[00:18:04] de C:\
[00:18:04] s
[00:18:04] r
[00:18:04] \i
[00:18:04] nd
[00:18:04] i

WTF!

@Blaisorblade
Copy link
Collaborator Author

@sjakobi Please worry not, I've figured it out, it was my fault and I've got green builds now:
https://ci.appveyor.com/project/Blaisorblade/stack/history
I'm almost done rebasing and squashing for review.

Prepare to split this function to have unit tests for it.

This change should be harmless but is not a refactoring, separating this
for testing.
* Stop trying to reset permissions on pax header entries.
* Add changelog entry.
* Output warnings for unexpected entries.
* Add testcases.

The interface of untar is designed for unit testing.
@Blaisorblade Blaisorblade force-pushed the 2361-ignore-pax-header-entries branch from 88776f7 to 2fb8bdc Compare July 19, 2016 12:38
@Blaisorblade
Copy link
Collaborator Author

I force-pushed based on this success (https://ci.appveyor.com/project/Blaisorblade/stack/build/1.0.11), got a hung build after rebasing on master but that's hopefully just flaky or not having to do with the new code (I'm trying to rebuild that).

Meanwhile, I think this is ready for review.

-- Takes a path to a .tar.gz file, the name of the directory it should contain,
-- and a destination folder to extract the tarball into. Returns unexpected
-- entries, as pairs of paths and descriptions.
untar :: FilePath -> FilePath -> FilePath -> IO [(FilePath, T.Text)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see Path being used here as much as possible / reasonable. IMO the distinction between the different Path types makes code (especially type signatures) more readable and slightly more robust.

You can find utils for working with Path in src/Path (especially the amazing Path.Extra.toFilePathNoTrailingSep ;)) and in the path-io package.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen path-io, but this is not trivial since this code has to talk to non-path-ified code.
But you have a point about signatures.

Also, the existing code uses FilePath quite a lot, see all the FP.</>. I guess you'd be fine with changes to the existing code, both as strictly needed to change untar's signature and as it makes sense? I'll see what I can do later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is not trivial since this code has to talk to non-path-ified code.

Yes, I guess for the return value that is only going to be printed it doesn't matter as much. But for the input types it would be great, so we can encourage further usage of Path in any callers of untar.

Also, the existing code uses FilePath quite a lot, see all the FP.</>.

Yes, there's a lot of room for improvement in this codebase! ;)

@sjakobi
Copy link
Member

sjakobi commented Jul 19, 2016

~/tmp $ stack-Blaisorblade unpack ghc-core
Populated index cache.    

Unpacked ghc-core-0.5.6 to /home/simon/tmp/ghc-core-0.5.6/

Nice!

Great code, @Blaisorblade, the tests are totally awesome!

This will be ready to merge once my comment above has been addressed!

@Blaisorblade
Copy link
Collaborator Author

the tests are totally awesome!

Thanks! Happy to hear that, also because I got them very wrong the first time (so I should have asked for a close look there)—I was adding the tests only after all the actions in IO succeeded:
Blaisorblade@bd3436b
I had never used hspec, but there's a surprising amount of ways to get those tests subtly wrong.

Now I can also use Path's `ensureDir` again (I had to inline it to
`D.createDirectoryIfMissing True`).

Also, inline calls to toFilePath in untar, because:

1. toFilePath costs nothing so the inlining is safe.
2. Having to name both the Path and FilePath variants of the same
variable means looking for trouble.
@Blaisorblade
Copy link
Collaborator Author

@sjakobi Converting a bit more to path was easier than I expected. I ended up removing the import of System.Directory, replaced by path-io.

@mgsloan
Copy link
Contributor

mgsloan commented Jul 20, 2016

Awesome, LGTM! Merging

@mgsloan mgsloan merged commit c2fa52e into commercialhaskell:master Jul 20, 2016
@Blaisorblade Blaisorblade deleted the 2361-ignore-pax-header-entries branch July 20, 2016 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stack install ghc-core fails because it mishandles special tar entries
6 participants