Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"object of type 'pickle.PickleBuffer' has no len()" error with large numpy arrays #23

Closed
PaulFlanaganGenscape opened this issue Feb 10, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@PaulFlanaganGenscape
Copy link

I get object of type 'pickle.PickleBuffer' has no len() error for any compression other than gzip if data contains a large numpy array

It works for small numpy arrays

I'm pretty sure it's same issue as pandas-dev/pandas#39376

ipython
Python 3.9.0 (default, Oct 13 2020, 14:30:47)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import compress_pickle
   ...: import numpy as np
   ...: import pickle
   ...:
   ...: dnp = {"np_array": np.zeros((100, 37000, 3))}
   ...:
   ...: pickled = pickle.dumps(dnp)
   ...:

In [2]: len(pickled)
Out[2]: 88800178

In [3]:
   ...: pickled = compress_pickle.dumps(dnp, compression='gzip')
   ...: len(pickled)
Out[3]: 86506

In [4]: pickled = compress_pickle.dumps(dnp, compression='zipfile')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-0201a3824617> in <module>
----> 1 pickled = compress_pickle.dumps(dnp, compression='zipfile')

~/git/Genscape/dispatch-modeler/.venv/lib/python3.9/site-packages/compress_pickle/compress_pickle.py in dumps(obj, compression, protocol, fix_imports, buffer_callback, optimize, **kwargs)
    206     validate_compression(compression, infer_is_valid=False)
    207     with io.BytesIO() as stream:
--> 208         dump(
    209             obj,
    210             path=stream,

~/git/Genscape/dispatch-modeler/.venv/lib/python3.9/site-packages/compress_pickle/compress_pickle.py in dump(obj, path, compression, mode, protocol, fix_imports, buffer_callback, unhandled_extensions, set_default_extension, optimize, **kwargs)
    125                 io_stream.write(buff)
    126             else:
--> 127                 pickle.dump(  # type: ignore
    128                     obj,
    129                     io_stream,

~/.pyenv/versions/3.9.0/lib/python3.9/zipfile.py in write(self, data)
   1121         if self.closed:
   1122             raise ValueError('I/O operation on closed file.')
-> 1123         nbytes = len(data)
   1124         self._file_size += nbytes
   1125         self._crc = crc32(data, self._crc)

TypeError: object of type 'pickle.PickleBuffer' has no len()

In [5]: pickled = compress_pickle.dumps(dnp, compression='lz4')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-90f985e33a25> in <module>
----> 1 pickled = compress_pickle.dumps(dnp, compression='lz4')

~/git/Genscape/dispatch-modeler/.venv/lib/python3.9/site-packages/compress_pickle/compress_pickle.py in dumps(obj, compression, protocol, fix_imports, buffer_callback, optimize, **kwargs)
    206     validate_compression(compression, infer_is_valid=False)
    207     with io.BytesIO() as stream:
--> 208         dump(
    209             obj,
    210             path=stream,

~/git/Genscape/dispatch-modeler/.venv/lib/python3.9/site-packages/compress_pickle/compress_pickle.py in dump(obj, path, compression, mode, protocol, fix_imports, buffer_callback, unhandled_extensions, set_default_extension, optimize, **kwargs)
    149                 io_stream.write(buff)
    150             else:
--> 151                 pickle.dump(obj, io_stream, protocol=protocol, fix_imports=fix_imports)
    152         finally:
    153             io_stream.flush()

~/git/Genscape/dispatch-modeler/.venv/lib/python3.9/site-packages/lz4/frame/__init__.py in write(self, data)
    694         compressed = self._compressor.compress(data)
    695         self._fp.write(compressed)
--> 696         self._pos += len(data)
    697         return len(data)
    698

TypeError: object of type 'pickle.PickleBuffer' has no len()

small numpy array

In [1]: import compress_pickle
   ...: import numpy as np

In [2]: dnp = {"np_array": np.zeros((10, 37, 3))}

In [3]: compress_pickle.utils.get_known_compressions()
Out[3]: [None, 'pickle', 'gzip', 'bz2', 'lzma', 'zipfile', 'lz4']

In [4]: pickled = compress_pickle.dumps(dnp, compression='zipfile')

In [5]: len(pickled)
Out[5]: 9136

In [6]: unpickled = compress_pickle.loads(pickled, compression='zipfile')

In [8]: unpickled['np_array'].shape
Out[8]: (10, 37, 3)
@lucianopaz
Copy link
Owner

Thanks for reporting this @PaulFlanaganGenscape! From the PR that you linked, it looks like protocol 5 is breaking something. Could you try if compress_pickle.dump(..., protocol=4) works?

When I find some time, I'll port the solution that the pandas team did over on their PR here.

@lucianopaz lucianopaz added the bug Something isn't working label Feb 11, 2021
@PaulFlanaganGenscape
Copy link
Author

yes, you're right. It works with protocol=4

In [61]: lb, ub = -1, 1
    ...: x = np.random.uniform(low=lb,high=ub,size=(1,100000000))

In [62]: humanize.naturalsize( x.nbytes )
Out[62]: '800.0 MB'

In [63]: dump(x, "x.pkl.bz", compression="bz2", protocol=4)


In [64]: dump(x, "x.pkl.bz", compression="bz2")

TypeError                                 Traceback (most recent call last)
<ipython-input-87-5d854cdb6283> in <module>
----> 1 dump(x, "x.pkl.bz", compression="bz2")

.venv/lib/python3.9/site-packages/compress_pickle/compress_pickle.py in dump(obj, path, compression, mode, protocol, fix_imports, buffer_callback, unhandled_extensions, set_default_extension, optimize, **kwargs)
    149                 io_stream.write(buff)
    150             else:
--> 151                 pickle.dump(obj, io_stream, protocol=protocol, fix_imports=fix_imports)
    152         finally:
    153             io_stream.flush()

~/.pyenv/versions/3.9.0/lib/python3.9/bz2.py in write(self, data)
    234             compressed = self._compressor.compress(data)
    235             self._fp.write(compressed)
--> 236             self._pos += len(data)
    237             return len(data)
    238

TypeError: object of type 'pickle.PickleBuffer' has no len()

In [65]:

@dom-insytesys
Copy link

I'm having the same problem pickling a Pandas DataFrame. Switching to protocol=4 makes it work.

@lucianopaz
Copy link
Owner

Closed by #26

@ghost
Copy link

ghost commented Jul 22, 2021

It's a similar bug of https://bugs.python.org/issue44439
I will create an issue in Python issue tracker about this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants