-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract archive into buffer (dict of io.BytesIO()) #111
Conversation
Extract archive into buffer (dict of io.BytesIO()) Add several test for this
Codecov Report
@@ Coverage Diff @@
## master #111 +/- ##
==========================================
+ Coverage 84.41% 84.53% +0.12%
==========================================
Files 10 10
Lines 2258 2270 +12
Branches 378 381 +3
==========================================
+ Hits 1906 1919 +13
Misses 230 230
+ Partials 122 121 -1
|
Strange.. Travis-CI test result is not appeared in check in github but it was failed. Please fix the failures. see https://travis-ci.org/github/miurahr/py7zr/builds/673297726 Errors are as follows:
and
You can run |
Now update github integration of travis-ci. The result will show from next push. |
@Zoynels could you merge master HEAD into this PR branch and push again? |
Looks good and a failure is not related to here. |
I have several questions related to API design, @Zoynels
|
Now it stores only filename and data of extracted file. So if it should store such metadata as timestamp, permissions we need more complex dict storage, e.i.:
Now _dict stores only files, not directories. If we need to save all structure, then I see several ways:
I think for extract(), extractall(), write(), writeall() when extract into real files it should return True/False if write/extraction finished with success or failure.
Is it could store information in class and it could be more good. What do you think?
It uses standart way of extraction, only when saves data into io it saves it into io.iobytes() instead of real file. All other functionality is same. But now it don't restore junction, symlinks as I don't fully understand how filenames should stores. I write about it in #112
_dict is filled only if extract into dict and if extract into filesystem it will be empty. If you think that instead of return_dict: bool = False we could place None or _dict which will be filled with extracted information. In this way only dict could be placed and class of file couldn't be created. I thin it is also good option.
|
We have an interface SevenZipFile.list() that return a List[FileInfo]. That include these meta data. If user combine this interface with its contents, it may be ok. |
When looking into python standard ZipFile API, which has a same purpose API named read() https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.read
It seems to be better SevenZipFile and ZipFile share the same API. Unfortunately TarFile class does not have an interface for same purpose. |
Yes this is the best choice. You are right that it hasn't read() intarface, but: https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractfile
|
Tar file is a simple concatenate of files, so it can return opened archive file, with seeking file pointer to head of an inside file in archive file. And as you know ,7-zip is an archive format with solid-compression support, which is unable to extract in random access but good compression ratio. We cannot provide an interface which argument is a member or filename, which requires random access mode, or read entire archive file for extraction of each portion of archive. |
Here is my proposal for an interface design.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to introduce methods read()
and readall()
for dictionary return.
It help keep method signature for extract() and extractall(), and help understand read()
as similar as ZipFile class.
@@ -644,15 +644,16 @@ def test(self) -> bool: | |||
"""Test archive using CRC digests.""" | |||
return self._test_digests() | |||
|
|||
def extractall(self, path: Optional[Any] = None) -> None: | |||
def extractall(self, path: Optional[Any] = None, return_dict: bool = False) -> Optional[Dict[str, IO[Any]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we keep extractall(self, path)-> None
as is and introduce readall(self, path) -> Dict[str, IO[Any]]
?
|
||
def extract(self, path: Optional[Any] = None, targets: Optional[List[str]] = None) -> None: | ||
def extract(self, path: Optional[Any] = None, targets: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep extract(self, path)->None
method signature as is and introduce read(self, path) -> Dict[str, IO[Any]]Introduce internal fucntion
_extract(self, path, targets, return_dict) -> Optional[Dict]]` that is your proposal.
That keeps method signature and introduce read()
assert tmp_path.joinpath('test/test2.txt').open('rb').read() == bytes('This file is located in a folder.', 'ascii') | ||
assert tmp_path.joinpath('test1.txt').open('rb').read() == bytes('This file is located in the root.', 'ascii') | ||
if not return_dict: | ||
archive.extractall(path=tmp_path, return_dict=return_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can change to archive.exttractall(path=tmp_path)
assert tmp_path.joinpath('test/test2.txt').open('rb').read() == bytes('This file is located in a folder.', 'ascii') | ||
assert tmp_path.joinpath('test1.txt').open('rb').read() == bytes('This file is located in the root.', 'ascii') | ||
else: | ||
_dict = archive.extractall(return_dict=return_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It become _dict = archive.readall()
This comment has been minimized.
This comment has been minimized.
Here is no activity by original proposer @Zoynels in recent two weeks. We have good discussion about API and hope merge it with modification. |
Extract archive into buffer (dict of io.BytesIO()) (Update of #111)
My issue #110
Extract archive into dict of io.BytesIO(), where keys are filepath and values are io.BytesIO()
Not work with symlinks