Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Python, beyond semantic interposition #575

Open
itamarst opened this issue Feb 5, 2021 · 8 comments
Open

Faster Python, beyond semantic interposition #575

itamarst opened this issue Feb 5, 2021 · 8 comments
Labels
Request Request for image modification or feature

Comments

@itamarst
Copy link

itamarst commented Feb 5, 2021

#501 has a useful suggestion for speeding up Python by ~20%. After that's done, it's actually possible to do better.

Host is Fedora 33. All tests were run with Python 3.9.

On host:

  • Fedora's Python gives 200K pystone/sec.
  • Conda-Forge Python gives 240K pystone/sec.

Running inside Docker 20.04 (cgroups v2 enabled):

  • fedora:33 gives 173K pystone/sec.
  • python:3.9-slim-buster, I get 169K pystone/sec.
  • ubuntu:20.04 (no shared library): 183K pystone/sec.
  • continuumio/miniconda3 with Python from Conda-Forge: 189K/sec

I am mystified why things are so much slower inside Docker. Some of this is clearly not because of the image, but the runtime. But notice the Ubuntu image is definitely faster.

With podman:

  • python:3.9-slim-buster: 204K/sec
  • continuumio/miniconda3 with Python from Conda-Forge: 230K/sec

Note that the Anaconda (default Conda) Python 3.9 does not appear faster, it's specifically whatever Conda-Forge does. I am trying to figure that out.

@wglambert wglambert added the Request Request for image modification or feature label Feb 8, 2021
@itamarst
Copy link
Author

itamarst commented Feb 8, 2021

I ran some more benchmarks; same basic results though, this image is the slowest: https://pythonspeed.com/articles/faster-python/

@Uzlopak
Copy link

Uzlopak commented Apr 11, 2021

@itamarst

Hi, I found your page very useful. I am actually a nodejs dev, but I am currently optimizing our python docker images. We use python 3.7

Should we create a custom ubuntu + python 3.7 with semantic interposition and lto for maximum performance? Our python services are anyway fucking huge (3-5 GBy, don't ask ;)) and are computational heavy. Every percent more performance is recognizable.

@Uzlopak
Copy link

Uzlopak commented Apr 11, 2021

@itamarst

The performance hit in docker comes imho from seccomp

https://stackoverflow.com/questions/60840320/docker-50-performance-hit-on-cpu-intensive-code

@Uzlopak
Copy link

Uzlopak commented Apr 11, 2021

Yeah deactivating seccomp results in a massive speed boost. BUt i guess it is not the idea to deactivate seccomp ;)

I read an article, that in linux 5.11 seccomp got optimized reducing some lookup overhead.
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.11-SECCOMP-Performance

So it is also relevant in your performance tests, on which system you run your tests.

@Uzlopak
Copy link

Uzlopak commented Apr 11, 2021

@itamarst

I extra upgraded my machine to Linux 5.11. The seccomp performance hit does not change

aras@workstation-111:~/Workspace/python-build-benchmarks$ docker run  python-performance
Requirement already satisfied: pyperformance in /usr/local/lib/python3.7/site-packages (1.0.1)
Requirement already satisfied: pyperf in /usr/local/lib/python3.7/site-packages (from pyperformance) (2.2.0)
Python benchmark suite 1.0.1

[1/3] 2to3...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_2to3.py --fast --output /tmp/tmp3_xdd1vn`
...........
2to3: Mean +- std dev: 464 ms +- 16 ms
[2/3] django_template...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_django_template.py --fast --output /tmp/tmpkx4f5ssh`
...........
django_template: Mean +- std dev: 82.7 ms +- 2.8 ms
[3/3] unpickle_pure_python...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_pickle.py --pure-python unpickle --fast --output /tmp/tmp2_9nr_l8`
...........
unpickle_pure_python: Mean +- std dev: 515 us +- 16 us

Performance version: 1.0.1
Report on Linux-5.11.0-13-generic-x86_64-with-debian-10.9
Number of logical CPUs: 8
Start date: 2021-04-11 19:25:56.153610
End date: 2021-04-11 19:26:27.039505

### 2to3 ###
Mean +- std dev: 464 ms +- 16 ms

### django_template ###
Mean +- std dev: 82.7 ms +- 2.8 ms

### unpickle_pure_python ###
Mean +- std dev: 515 us +- 16 us

aras@workstation-111:~/Workspace/python-build-benchmarks$ docker run  --security-opt seccomp=unconfined python-performance
Requirement already satisfied: pyperformance in /usr/local/lib/python3.7/site-packages (1.0.1)
Requirement already satisfied: pyperf in /usr/local/lib/python3.7/site-packages (from pyperformance) (2.2.0)
Python benchmark suite 1.0.1

[1/3] 2to3...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_2to3.py --fast --output /tmp/tmpjd495cck`
...........
2to3: Mean +- std dev: 372 ms +- 24 ms
[2/3] django_template...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_django_template.py --fast --output /tmp/tmpbq4ujyfw`
...........
django_template: Mean +- std dev: 63.4 ms +- 2.2 ms
[3/3] unpickle_pure_python...
INFO:root:Running `/venv/cpython3.7-51e257070d4f/bin/python -u /venv/cpython3.7-51e257070d4f/lib/python3.7/site-packages/pyperformance/benchmarks/bm_pickle.py --pure-python unpickle --fast --output /tmp/tmp18siurkb`
...........
unpickle_pure_python: Mean +- std dev: 375 us +- 12 us

Performance version: 1.0.1
Report on Linux-5.11.0-13-generic-x86_64-with-debian-10.9
Number of logical CPUs: 8
Start date: 2021-04-11 19:26:36.908459
End date: 2021-04-11 19:27:02.124846

### 2to3 ###
Mean +- std dev: 372 ms +- 24 ms

### django_template ###
Mean +- std dev: 63.4 ms +- 2.2 ms

### unpickle_pure_python ###
Mean +- std dev: 375 us +- 12 us

@Uzlopak
Copy link

Uzlopak commented Apr 11, 2021

Further research makes me believe, that docker has a general seccomp performance hit. I modified the default seccomp profile to SCMP_ACT_KILL and it did not kill the service. So I assume that your benchmark never hits a seccomp restriction.

moby/moby#41389
moby/moby#42074

Even when i make an all allow seccomp profile results in a performance hit. So only by using seccomp we have the performance issues. So we have here plain overhead, which is either in linux kernel or in docker.

@mrjackbo
Copy link

I ran into a similar issue with seccomp and docker before and in my case the answer turned out to be that starting an application with seccomp activates not only seccomp but also a certain meltdown mitigation which was deactivated by default in my kernel. See here: https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SpectreAndMeltdown/MitigationControls and look for spec_store_bypass_disable=[prctl|seccomp]

@methane
Copy link

methane commented Oct 29, 2021

See https://bugs.python.org/issue38980 for --enable-shared performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Request Request for image modification or feature
Projects
None yet
Development

No branches or pull requests

5 participants