Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-22908: Add seek and tell functionality to ZipExtFile #4966

Merged
merged 10 commits into from
Jan 30, 2018
56 changes: 56 additions & 0 deletions Lib/zipfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -772,6 +772,17 @@ def __init__(self, fileobj, mode, zipinfo, decrypter=None,
else:
self._expected_crc = None

self._seekable = False
try:
if fileobj.seekable():
self._orig_compress_start = fileobj.tell()
self._orig_compress_size = zipinfo.compress_size
self._orig_file_size = zipinfo.file_size
self._orig_start_crc = self._running_crc
self._seekable = True
except AttributeError:
pass

def __repr__(self):
result = ['<%s.%s' % (self.__class__.__module__,
self.__class__.__qualname__)]
Expand Down Expand Up @@ -957,6 +968,51 @@ def close(self):
finally:
super().close()

def seekable(self):
return self._seekable

def seek(self, offset, from_what = 0):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend matching the signature of the same method in the io module: seek(self, pos, whence=0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose offset because I can better differentiate between offset value (relative values) and position values (absolute values). Also, according to the official documentation "offset" should be the first argument for seek (although there is no consensus when I search through the cpython source). https://docs.python.org/3/library/io.html#io.IOBase.seek

I agree with changing "from_what" to "whence". The former always felt awkward to me. Thank you for the suggestion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like older modules use pos (e.g. mmap), and the more recent io ABCs use offset. This should be fine, and I expect people to pass the parameter as a positional argument anyway.

if not self._seekable:
raise io.UnsupportedOperation("underlying stream is not seekable")
curr_pos = self.tell()
new_pos = offset # Default seek from start of file
if from_what == 1: # Seek from current position
new_pos = curr_pos + offset
elif from_what == 2: # Seek from EOF
new_pos = self._orig_file_size + offset

if new_pos > self._orig_file_size:
new_pos = self._orig_file_size

if new_pos < 0:
new_pos = 0

if new_pos > curr_pos:
self.read(new_pos - curr_pos)
elif new_pos < curr_pos:
if self._offset >= curr_pos - new_pos:
# No need to reset if the new position is within the read buffer
self._offset -= curr_pos - new_pos
else:
self._fileobj.seek(self._orig_compress_start)
self._running_crc = self._orig_start_crc
self._compress_left = self._orig_compress_size
self._left = self._orig_file_size
self._readbuffer = b''
self._offset = 0
self._decompressor = zipfile._get_decompressor(self._compress_type)
self._eof = False
if new_pos > 0:
self.read(new_pos)

return self.tell()

def tell(self):
if not self._seekable:
raise io.UnsupportedOperation("underlying stream is not seekable")
filepos = self._orig_file_size - self._left - len(self._readbuffer) + self._offset
return filepos


class _ZipWriteFile(io.BufferedIOBase):
def __init__(self, zf, zinfo, zip64):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Added seek and rest to the ZipExtFile class. This only works if the file
object used to open the zipfile is seekable. This change is the first
iteration of this feature and can easily be optimized by considering the
_readbuffer.