Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is python shipped with the python docker image slower than that of my local machine? #825

Open
SimonLammer opened this issue Apr 28, 2023 · 7 comments

Comments

@SimonLammer
Copy link

I've observed a roughly 11% performance overhead when using the python distribution shipped with the python:3 image, compared to the python distribution installable through ppa:deadsnakes/ppa: https://stackoverflow.com/a/76133102/2808520

local dockerbinary
avg 0.79917586 0.89829016
std 0.02433539 0.03554546
min 0.78087375 0.86344007
q1 0.78211388 0.86950620
q2 0.79006154 0.88853465
q3 0.80732969 0.91612282
max 0.89824817 0.99477790
$ file `which python3.10`
/usr/bin/python3.10: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=fb3f4369481251e6ba441382fd6d9ab47af0db29, for GNU/Linux 3.2.0, stripped
$ file docker-python/local/bin/python3.10
docker-python/local/bin/python3.10: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=618b23f947f202224f4ea8e16375ac7bcad13c4f, for GNU/Linux 3.2.0, with debug_info, not stripped

My guess is that the with debug_info compilation introduces this ~11% performance overhead.

I'd appreciate if someone could tell me if my guess is correct.

@tianon
Copy link
Member

tianon commented Apr 28, 2023

#575 might have some useful ideas/info/discussion in it for you

@sowinski

This comment was marked as spam.

@blopker
Copy link
Contributor

blopker commented Dec 21, 2023

For what it's worth, with_debug isn't known to have any performance impact, other than a bit larger binary size. There are several (non-Python) discussions about this, for example: https://stackoverflow.com/questions/8676466/how-do-debug-symbols-affect-performance-of-a-linux-executable-compiled-by-gcc

I'd guess the slowdown is likely due to container security overhead. You should try to run your tests with docker run --security-opt seccomp:unconfined. See: https://stackoverflow.com/questions/60840320/docker-50-performance-hit-on-cpu-intensive-code

@SimonLammer
Copy link
Author

I'd guess the slowdown is likely due to container security overhead. You should try to run your tests with docker run --security-opt seccomp:unconfined. See: https://stackoverflow.com/questions/60840320/docker-50-performance-hit-on-cpu-intensive-code

Docker itself would add more overhead on top of that unless some security features are disabled (i.e. running the tests in docker with --privileged yielded very similar results to "dockerbinary"; standard docker took about twice as long as that).
The tests for "dockerbinary" ran without docker - I copied the python version distributed via docker to my host machine and proceeded to execute the tests with that directly on my host; and still observed the ~11% performance overhead.

@blopker
Copy link
Contributor

blopker commented Dec 22, 2023

I see, I didn't catch the Docker Python binary was extracted, then tested. Although, I looked around a bit more and couldn't find any evidence that the debug symbols hurt performance. If you still have the test set up, it looks like you can strip a binary after it was compiled with strip --strip-debug. Could be an easy way to test the theory. Otherwise, there might be something else going on.

@blopker
Copy link
Contributor

blopker commented Dec 22, 2023

Cool, I ran some quick (read: could be unreliable) benchmarks with Python 3.12.1 from the official Docker image and from Deadsnakes. I also ran a test with a stripped version of the Docker binary. These tests were run inside Docker, the official binary within the official container and Deadsnake binary in the latest Ubuntu container. All on my Mac M1 laptop.

I ran the float test from pyperformance on rigorous: pyperformance run -b float -r -o NAME.json.

Results:
Official Docker binary vs Deadsnake

+-----------+---------------------+-------------------+--------------+----------------------+
| Benchmark | pydocker_float.json | pydead_float.json | Change       | Significance         |
+===========+=====================+===================+==============+======================+
| float     | 63.5 ms             | 60.8 ms           | 1.04x faster | Significant (t=9.11) |
+-----------+---------------------+-------------------+--------------+----------------------+

Official Docker binary vs same binary, but with strip --strip-all applied:

+-----------+---------------------+------------------------------+--------------+----------------------+
| Benchmark | pydocker_float.json | pydocker_float_stripped.json | Change       | Significance         |
+===========+=====================+==============================+==============+======================+
| float     | 63.5 ms             | 61.2 ms                      | 1.04x faster | Significant (t=9.97) |
+-----------+---------------------+------------------------------+--------------+----------------------+

And finally, stripped official binary vs Deadsnake:

+-----------+------------------------------+-------------------+--------------+-----------------+
| Benchmark | pydocker_float_stripped.json | pydead_float.json | Change       | Significance    |
+===========+==============================+===================+==============+=================+
| float     | 61.2 ms                      | 60.8 ms           | 1.01x faster | Not significant |
+-----------+------------------------------+-------------------+--------------+-----------------+

Analysis:
While I'm not seeing the 11% performance difference, there seems at least a 4% speedup when stripping the debug symbols. Stipped binary vs Deadsnake does not have a significant performance difference. I also tried the test on a few other benchmarks and the speedup seems consistent. I think these results need further investigation though. A full benchmark in a more consistent environment would be good.

The other open question is how would stripping these symbols affect usage? That's not clear to me, and we would need to weigh that vs the small performance bump. There seems to be other open tickets requesting more debug info, so I'm not sure if these symbols are doing anything at all?

@blopker
Copy link
Contributor

blopker commented Dec 26, 2023

Interesting. It looks like the python:slim image variants are stripped:

root@985e385a5760:/app# file /usr/local/bin/python3.12
/usr/local/bin/python3.12: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=c421fbb49476f1727009a04fcaf0c49e6a81a615, for GNU/Linux 3.7.0, stripped

And indeed, the slim binaries are faster than non-slim:

+-----------+---------------------+-------------------------+--------------+-----------------------+
| Benchmark | pydocker_float.json | pydockerslim_float.json | Change       | Significance          |
+===========+=====================+=========================+==============+=======================+
| float     | 63.5 ms             | 60.6 ms                 | 1.05x faster | Significant (t=12.30) |
+-----------+---------------------+-------------------------+--------------+-----------------------+

Since people do use the slim package for optimizing file size, I think it makes sense to use it when you want to get a bit better performance at the cost of "debuggability". Maybe this performance difference should be documented somewhere, but I think the answer to this issue is just to use the slim images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants