Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-2076: [Python] Display slowest test durations #1541

Closed
wants to merge 1 commit into from

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Jan 31, 2018

No description provided.

@wesm
Copy link
Member

wesm commented Jan 31, 2018

Thanks @pitrou! Well it's pretty clear cut:

Python 3.6:

236.05s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_serialization.py::test_custom_serialization
41.78s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_subscribe_deletions
38.78s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_serialization.py::test_primitive_serialization
28.87s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_subscribe
9.01s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_serialization.py::test_complex_serialization
8.90s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_serialization.py::test_serialize_to_buffer
8.30s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_use_one_memory_mapped_file
6.90s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_store_full
5.32s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_many_hashes
4.88s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_create_with_metadata
4.59s setup    pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_use_one_memory_mapped_file
3.34s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_get
2.92s call     pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_create_existing
2.82s teardown pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_use_one_memory_mapped_file
2.54s teardown pyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_store_full

and Python 2.7

285.93s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_serialization.py::test_custom_serialization
44.93s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_serialization.py::test_primitive_serialization
42.07s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_subscribe_deletions
29.15s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_subscribe
12.06s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_serialization.py::test_serialize_to_buffer
12.06s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_serialization.py::test_complex_serialization
8.30s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_use_one_memory_mapped_file
6.74s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_store_full
5.35s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_many_hashes
4.60s setup    pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_use_one_memory_mapped_file
4.52s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_create_with_metadata
3.33s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_get
2.80s teardown pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_use_one_memory_mapped_file
2.68s call     pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_create_existing
2.55s teardown pyarrow-test-2.7/lib/python2.7/site-packages/pyarrow/tests/test_plasma.py::TestPlasmaClient::test_store_full

Not sure exactly what's happening (swapping?) but it looks like we ought to be able to trim 6-7 minutes off by doing something about these tests. cc @robertnishihara @pcmoritz

@robertnishihara
Copy link
Contributor

Thanks, we'll look into it.

@xhochy xhochy changed the title [Python] Display slowest test durations ARROW-2076: [Python] Display slowest test durations Feb 1, 2018
Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, merging as want to have this output continuously in Travis.

@xhochy xhochy closed this in c1d77a1 Feb 1, 2018
@robertnishihara
Copy link
Contributor

Any ideas about speeding this up would be appreciated. Note that the tests run very quickly locally (<1s for test_serialization.py and 10s for test_plasma.py) and the tests also run quickly on the MacOS Travis build.

I tried compiling with -DCMAKE_BUILD_TYPE=Debug locally instead of Release, but wasn't able to reproduce the slowness (locally).

Also tried making large arrays in test_serialization.py smaller, but that didn't change anything.

@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 2, 2018

I think the problem is that in test_serialization.py, Bar contains a copy of PRIMITIVE_OBJECTS + COMPLEX_OBJECTS and then Qux contains a bunch of copies of Bar, so we are serializing PRIMITIVE_OBJECTS + COMPLEX_OBJECTS a lot of times. If this is slower on travis (due to swapping, VM overhead or anything else, the whole test is slowed down a lot.

So let's slim down the objects that Bar contains!

@robertnishihara
Copy link
Contributor

That doesn't explain why test_primitive_serialization is slow.. may need to just remove objects/code until it gets fast and see which change mattered.

@pitrou pitrou deleted the slowest-test-durations branch September 19, 2018 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants