Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid extracting joblib archives #729

Closed
wants to merge 2 commits into from

Conversation

egibs
Copy link
Member

@egibs egibs commented Dec 18, 2024

We're encountering failures when trying to extract files such as:

joblib/test/data/joblib_0.9.2_compressed_pickle_py35_np19.gz

While these appear to be valid gzip archives, they are actually just application/octet-stream/data files:

joblib/test/data/joblib_0.9.2_compressed_pickle_py35_np19.gz: data

This PR avoids extracting files like this if they contain common joblib extensions and have a MIME type of application/octet-stream.

To avoid creating temporary directories (and also scanning the equivalent of "" if tmpRoot is not created), I tweaked processArchive to return early if we detect these files. The nested extraction logic has also been updated to ignore joblib archives.

@egibs egibs requested a review from tstromberg December 18, 2024 20:21
Signed-off-by: egibs <20933572+egibs@users.noreply.github.com>
@egibs egibs force-pushed the ignore-joblib-archives branch from d3c9c8c to 7dc9d41 Compare December 18, 2024 20:46
@tstromberg
Copy link
Collaborator

I can't help but feel like hardcoding this exception introduces unnecessary complexity without a tangible benefit. Is this just to suppress output errors?

@egibs
Copy link
Member Author

egibs commented Dec 19, 2024

I can't help but feel like hardcoding this exception introduces unnecessary complexity without a tangible benefit. Is this just to suppress output errors?

Pretty much. Mainly since the files look like valid archives that should be extracting correctly.

Signed-off-by: Evan Gibler <20933572+egibs@users.noreply.github.com>
@egibs
Copy link
Member Author

egibs commented Dec 31, 2024

Closing this out for now since it's just to quiet down errors for unsupported archive types.

@egibs egibs closed this Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants