Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import: support extraction of bzip2 and gzip tarballs #3606

Closed
cota opened this issue May 30, 2020 · 2 comments · Fixed by #3607
Closed

import: support extraction of bzip2 and gzip tarballs #3606

cota opened this issue May 30, 2020 · 2 comments · Fixed by #3607
Labels
needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature."

Comments

@cota
Copy link
Contributor

cota commented May 30, 2020

Use case

I'm trying to use beets to import albums from a set of bzip2/gzip tarballs, i.e. .tar.bz2 and .tar.gz files. Would be nice if we could just import them instead of having to extract them first.

Solution

I see that the tarfile module can process tar archives with gzip or bzip2 compression, so hopefully this is not too hard to do. Unfortunately I don't know Python so I cannot provide a patch.

Alternatives

As a workaround I am extracting the tarballs first. Not the end of the world but it would be nice to be able to just point beets to the files, just like we do with rar and zip files.

Thanks for maintaining this great piece of software!

@jackwilsdon
Copy link
Member

jackwilsdon commented May 31, 2020

Have you tried importing a .tar.bz2 or .tar.gz? It looks like we should already support it from the source:

beets/beets/importer.py

Lines 1037 to 1038 in 19ab28e

from tarfile import is_tarfile, TarFile
cls._handlers.append((is_tarfile, TarFile))

@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label May 31, 2020
cota added a commit to cota/beets that referenced this issue May 31, 2020
Call tarfile.open instead of tarfile.TarFile from the importer so that
we can import compressed tar archives.

Note that tarfile.TarFile does not handle compressed archives:
$ python3
Python 3.8.2 (default, Apr 27 2020, 15:53:34)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tarfile
>>> tf = tarfile.TarFile("Lagrimas.tar.bz2")
Traceback (most recent call last):
[...]
tarfile.ReadError: invalid header
>>>

But tarfile.open does deal with them:
$ python3
Python 3.8.2 (default, Apr 27 2020, 15:53:34)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tarfile
>>> tf = tarfile.open("Lagrimas.tar.bz2")
>>>

Tested:
$ ls Lagrimas/*.mp3 | wc -l
11
$ tar cjf Lagrimas.tar.bz2 Lagrimas/

- Before:
$ beet import Lagrimas.tar.bz2
extraction failed: invalid header
No files imported from /tmp/Lagrimas.tar.bz2

- After:
$ beet import Lagrimas.tar.bz2
[works]

Fixes beetbox#3606.
@cota
Copy link
Contributor Author

cota commented May 31, 2020

Note that TarFile only works for uncompressed tar archives. Quoting from /usr/lib/python3.8/tarfile.py:

    def __init__(self, name=None, mode="r", fileobj=None, format=None,
            tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,
            errors="surrogateescape", pax_headers=None, debug=None,
            errorlevel=None, copybufsize=None):
        """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to
           read from an existing archive, 'a' to append data to an existing
           file or 'w' to create a new file overwriting an existing one. `mode'
           defaults to 'r'.
[...]

The method that we want is TarFile.open, not the constructor:

   def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):
        """Open a tar archive for reading, writing or appending. Return
           an appropriate TarFile class.

           mode:
           'r' or 'r:*' open for reading with transparent compression
[...]

I'll send a patch. Please double check the tests, this is the first time I write anything in Python and I've given up trying to get the necessary packages :(

cota added a commit to cota/beets that referenced this issue Jun 1, 2020
Call tarfile.open instead of tarfile.TarFile from the importer so that
we can import compressed tar archives.

Note that tarfile.TarFile does not handle compressed archives:
$ python3
Python 3.8.2 (default, Apr 27 2020, 15:53:34)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tarfile
>>> tf = tarfile.TarFile("Lagrimas.tar.bz2")
Traceback (most recent call last):
[...]
tarfile.ReadError: invalid header
>>>

But tarfile.open does deal with them:
$ python3
Python 3.8.2 (default, Apr 27 2020, 15:53:34)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tarfile
>>> tf = tarfile.open("Lagrimas.tar.bz2")
>>>

Tested:
$ ls Lagrimas/*.mp3 | wc -l
11
$ tar cjf Lagrimas.tar.bz2 Lagrimas/

- Before:
$ beet import Lagrimas.tar.bz2
extraction failed: invalid header
No files imported from /tmp/Lagrimas.tar.bz2

- After:
$ beet import Lagrimas.tar.bz2
[works]

Fixes beetbox#3606.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature."
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants