Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] get_object() is very slow and takes 100% cpu #402

Closed
donatello opened this issue Sep 6, 2016 · 3 comments
Closed

[bug] get_object() is very slow and takes 100% cpu #402

donatello opened this issue Sep 6, 2016 · 3 comments
Assignees

Comments

@donatello
Copy link
Member

I created 10 files with sizes 1MB, 2MB, ..., 10MB and put it on minio server with version (master branch, latest at time of writing this):

Version: 2016-09-06T17:37:46Z
Release-Tag: DEVELOPMENT.2016-09-06T17-37-46Z
Commit-ID: afe874f15a98dbd6a85008775bdbea5c2bc7bffc

Then, used the following Python script to download these files and to time them:

from minio import Minio
from minio.error import ResponseError
from datetime import datetime

# read object like in minio-py docs
def read_object(mc, bucket, path):
    try:
        data = mc.get_object(bucket, path)
        with open("/dev/null", "wb") as f:
            count = 0
            for d in data:
                f.write(d)
                count += len(d)
            print("wrote {}/{} to /dev/null - had {} bytes".format(bucket, path, count))
    except ResponseError as err:
        print("Read error in read_object()", err)

def time_it(func, args):
    start_time = datetime.now()
    try:
        func(*args)
    except KeyboardInterrupt:
        print("Function run was interrupted!")
    duration = datetime.now() - start_time
    print("Took - {} seconds!".format(duration.total_seconds()))

def run_it():
    mc = Minio("localhost:9000", "RQW51P7E9UU5PSSLT63L",
               "OiQkINYGE0xMbTyHJvlEJc+7cT75h9yFdm8yxNW3", False)
    for fname in range(1, 11):
        fname = "dummy-{}M".format(fname)
        print("Using read_object to fetch {}".format(fname))
        time_it(read_object, (mc, "dummy", fname))

if __name__ == "__main__":
    run_it()

The minio server is running locally.

The output is:

Using read_object to fetch dummy-1M
wrote dummy/dummy-1M to /dev/null - had 1048576 bytes
Took - 11.472366 seconds!
Using read_object to fetch dummy-2M
wrote dummy/dummy-2M to /dev/null - had 2097152 bytes
Took - 22.756636 seconds!
Using read_object to fetch dummy-3M
wrote dummy/dummy-3M to /dev/null - had 3145728 bytes
Took - 34.209517 seconds!
Using read_object to fetch dummy-4M
wrote dummy/dummy-4M to /dev/null - had 4194304 bytes
Took - 45.537017 seconds!
Using read_object to fetch dummy-5M
wrote dummy/dummy-5M to /dev/null - had 5242880 bytes
Took - 59.154392 seconds!
Using read_object to fetch dummy-6M
wrote dummy/dummy-6M to /dev/null - had 6291456 bytes
Took - 71.933805 seconds!
Using read_object to fetch dummy-7M
wrote dummy/dummy-7M to /dev/null - had 7340032 bytes
Took - 81.179803 seconds!
Using read_object to fetch dummy-8M
wrote dummy/dummy-8M to /dev/null - had 8388608 bytes
Took - 92.52303 seconds!
Using read_object to fetch dummy-9M
wrote dummy/dummy-9M to /dev/null - had 9437184 bytes
Took - 103.959337 seconds!
Using read_object to fetch dummy-10M
wrote dummy/dummy-10M to /dev/null - had 10485760 bytes
Took - 117.054 seconds!

While running the script, the python interpreter is at 100% CPU constantly.

Python version:

$ python --version
Python 3.5.2
@donatello
Copy link
Member Author

In comparison, downloading all 10 files on the same machine with mc takes less than a second:

$ time for i in $(seq 1 10); do echo $i; mc cat myminio/dummy/dummy-${i}M > /dev/null; done
1
2
3
4
5
6
7
8
9
10

real    0m0.234s
user    0m0.104s
sys 0m0.088s

@harshavardhana harshavardhana self-assigned this Sep 7, 2016
@harshavardhana
Copy link
Member

Looks like the problem is in reading

# read object like in minio-py docs
def read_object(mc, bucket, path):
    try:
        data = mc.get_object(bucket, path)
        with open("/dev/null", "wb") as f:
            count = 0
            for d in data.stream(32*1024):
                f.write(d)
                count += len(d)
            print("wrote {}/{} to /dev/null - had {} bytes".format(bucket, path, count))
    except ResponseError as err:
        print("Read error in read_object()", err)

@donatello
Copy link
Member Author

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants