File-like object returned from open_url
is extremely slow with S3
#241
Labels
cat:performance
Performance in terms of speed or memory consumption.
Milestone
open_url
returns a file-like object, which is so handy that we can directly pass it to various libraries that receive a file-like object when opening stuff, such asPIL.Image.open
,np.load
andpickle.load
etc.However, I noticed that it is the way slower than expected.
If we make a
BytesIO
from the binary obtained fromread()
, it yields better performance.I guess this future work comment (lack of buffered IO) has something to do with it.
https://github.com/pfnet/pfio/blob/master/pfio/v2/s3.py#L360-L361
Situation
Here I have several test data on S3 (actually Ozone) storage.
DSC07917.jpg
: a 1.1MB jpeg imagerandom.npy
: approx 128MB numpy random array (made bynp.save('random.npy', np.random.random((4096, 4096)))
)Directly passing the file-like object to these library:
Load the entire content to binary and then make
BytesIO
I observed the same situation with pickle, too.
This happens to neither local filesytem nor HDFS.
The text was updated successfully, but these errors were encountered: