bpo-46055: Speed up binary shifting operators #30044

xuxinhang · 2021-12-11T04:39:51Z

Inspired by https://bugs.python.org/issue44946, I found there were no special shortcuts for shifting operation applied to "medium value" long object. So I modified CPython's VM to accelerate my python project where there is plenty of binary shifting operation. I guess somebody else might also need it.

Thanks.

https://bugs.python.org/issue46055

the-knights-who-say-ni · 2021-12-11T04:39:54Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@xuxinhang

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

sweeneyde · 2021-12-11T23:54:59Z

Hi @xuxinhang, it would be great if you could open an issue on bugs.python.org to associate with this PR, and change the PR title accordingly.

Also, please make sure to get the CLA signed. Thanks!

xuxinhang · 2021-12-12T05:50:43Z

Sorry, I have create a new issue in the bug tracker and I'm waiting for CLA getting verified. Besides, the building in Ubuntu doesn't complain segment fault now.

sweeneyde

I didn't do a complete review, but here are a few comments.

It would also be nice to see some benchmarks for the change.

Objects/longobject.c

…avior of C right shift operator.

xuxinhang · 2021-12-12T12:23:00Z

BENCHMARKS

I use timeit to measure time costs and any other operators or calls are excluded. For each testcase, the former is dcd2796 and the latter is this PR's base 036bbb1.

64-bit Release building. Run in Windows 10 1709 (64-bit)

python -m timeit " i = 1; i <<= 3; i >>= 3"  # small value (cost down by 36%)
5000000 loops, best of 5: 92.7 nsec per loop
2000000 loops, best of 5: 145 nsec per loop

python -m timeit " i = 1; i <<= 10; i >>= 10"  # medium value (-25%)
2000000 loops, best of 5: 114 nsec per loop
2000000 loops, best of 5: 151 nsec per loop

python -m timeit " i = 1; i <<= 100; i >>= 100"  # big value  (-12%)
1000000 loops, best of 5: 209 nsec per loop
1000000 loops, best of 5: 238 nsec per loop

Objects/longobject.c

mdickinson

There are a few changes in this PR that aren't directly related to the stated goal of speeding up the shift operators. Please could you revert the unrelated changes, so that we can focus on the shift operations? That would make the PR easier to review and evaluate.

Objects/longobject.c

xuxinhang · 2021-12-24T12:33:23Z

Thanks.
——
I have made the requested changes; please review again

bedevere-bot · 2021-12-24T12:33:26Z

Thanks for making the requested changes!

@mdickinson: please review the changes made to this pull request.

mdickinson · 2021-12-24T13:10:05Z

@xuxinhang: Almost there. Please could you undo the changes to the medium_value function?

Misc/NEWS.d/next/Core and Builtins/2021-12-24-20-21-45.bpo-46055.R0QMVQ.rst

mdickinson

LGTM! Merging when the CI completes. Thank you!

bedevere-bot · 2021-12-27T10:00:55Z

🤖 New build scheduled with the buildbot fleet by @mdickinson for commit 9ed09aa 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

mdickinson · 2021-12-27T10:24:04Z

Some sample timings on my machine. "main" is at commit b8de8b7. "here" is this branch at commit 9ed09aa, after a merge with main. Timings performed on an Intel MacBook Pro, macOS 10.14.6, with ./python.exe -m timeit -s "i, s = 1, 3" "i << s" or ./python.exe -m timeit -s "i, s = 1, 3" "i >> s", with suitable values of i and s in the setup.

For 1<<3:
main: 5000000 loops, best of 5: 38.2 nsec per loop
here: 10000000 loops, best of 5: 24.7 nsec per loop
55% faster

For 1234567 >> 13:
main: 5000000 loops, best of 5: 38.2 nsec per loop
here: 10000000 loops, best of 5: 24.7 nsec per loop
55% faster

(No, those numbers aren't copy and paste errors - I really did get the exact same timings.)

For 1234567 >> 123:
main: 10000000 loops, best of 5: 23.3 nsec per loop
here: 10000000 loops, best of 5: 25.3 nsec per loop
8% slower (though this seems to be within random variation)

For -1234567 >> 13:
main: 5000000 loops, best of 5: 67.5 nsec per loop
here: 10000000 loops, best of 5: 36.2 nsec per loop
86% faster

For -1234567 >> 123:
main: 5000000 loops, best of 5: 39.5 nsec per loop
here: 10000000 loops, best of 5: 25.1 nsec per loop
57% faster

mdickinson · 2021-12-27T11:37:11Z

The Tests / Windows (x64) (pull_request) check seems to be stuck at the build step (1h57m and counting). Closing and re-opening to re-trigger the build.

serhiy-storchaka · 2021-12-27T11:53:09Z

I suggest to merge #30243 first and compare this PR with a new base.

…ptimization

mdickinson · 2021-12-27T18:28:30Z

Some new sample timings, now that #30243 has been merged into master, using the same machine and methodology as above. It's interesting that 1234567 >> 13 gets a significant speedup on my machine, but 12345678 >> 13 doesn't; looks like in the second case, the time is dominated by the time to construct a new PyLong for the result. (In the first case, the result is small.)

Comparing commits:
here: 59672cf
main: 360fedc

1 << 3: ~43% faster

here: 10000000 loops, best of 5: 26.7 nsec per loop
main: 10000000 loops, best of 5: 38.1 nsec per loop

1234567 >> 13: ~36% faster

here: 10000000 loops, best of 5: 26.6 nsec per loop
main: 10000000 loops, best of 5: 36.2 nsec per loop

12345678 >> 13: ~6% faster

here: 10000000 loops, best of 5: 34.3 nsec per loop
main: 10000000 loops, best of 5: 36.4 nsec per loop

-1234567 >> 13: ~77% faster

here: 5000000 loops, best of 5: 38.8 nsec per loop
main: 5000000 loops, best of 5: 68.6 nsec per loop

Perf: Accelerate binary shift op.

7b11d7d

the-knights-who-say-ni added the CLA not signed label Dec 11, 2021

bedevere-bot added the awaiting review label Dec 11, 2021

xuxinhang added 2 commits December 12, 2021 12:59

Fix: Ubuntu gcc segment fault.

403652a

Let PyLong_New accept a negtive size.

dcd2796

xuxinhang changed the title ~~Speed up binary shifting operators.~~ bpo-46055: Speed up binary shifting operators Dec 12, 2021

sweeneyde reviewed Dec 12, 2021

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

sweeneyde reviewed Dec 12, 2021

View reviewed changes