Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46055: Speed up binary shifting operators #30044

Merged
merged 18 commits into from
Dec 27, 2021

Conversation

xuxinhang
Copy link
Contributor

@xuxinhang xuxinhang commented Dec 11, 2021

Inspired by https://bugs.python.org/issue44946, I found there were no special shortcuts for shifting operation applied to "medium value" long object. So I modified CPython's VM to accelerate my python project where there is plenty of binary shifting operation. I guess somebody else might also need it.

Thanks.

https://bugs.python.org/issue46055

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@xuxinhang

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@sweeneyde
Copy link
Member

Hi @xuxinhang, it would be great if you could open an issue on bugs.python.org to associate with this PR, and change the PR title accordingly.

Also, please make sure to get the CLA signed. Thanks!

@xuxinhang xuxinhang changed the title Speed up binary shifting operators. bpo-46055: Speed up binary shifting operators Dec 12, 2021
@xuxinhang
Copy link
Contributor Author

Sorry, I have create a new issue in the bug tracker and I'm waiting for CLA getting verified. Besides, the building in Ubuntu doesn't complain segment fault now.

Copy link
Member

@sweeneyde sweeneyde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do a complete review, but here are a few comments.

It would also be nice to see some benchmarks for the change.

Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
@xuxinhang
Copy link
Contributor Author

BENCHMARKS

I use timeit to measure time costs and any other operators or calls are excluded. For each testcase, the former is dcd2796 and the latter is this PR's base 036bbb1.

64-bit Release building. Run in Windows 10 1709 (64-bit)

python -m timeit " i = 1; i <<= 3; i >>= 3"  # small value (cost down by 36%)
5000000 loops, best of 5: 92.7 nsec per loop
2000000 loops, best of 5: 145 nsec per loop

python -m timeit " i = 1; i <<= 10; i >>= 10"  # medium value (-25%)
2000000 loops, best of 5: 114 nsec per loop
2000000 loops, best of 5: 151 nsec per loop

python -m timeit " i = 1; i <<= 100; i >>= 100"  # big value  (-12%)
1000000 loops, best of 5: 209 nsec per loop
1000000 loops, best of 5: 238 nsec per loop

Objects/longobject.c Outdated Show resolved Hide resolved
@xuxinhang xuxinhang force-pushed the feat/binary_shift_optimization branch 4 times, most recently from 52aebcb to b2ef93d Compare December 13, 2021 15:56
@sweeneyde sweeneyde requested a review from mdickinson December 22, 2021 06:37
Objects/longobject.c Outdated Show resolved Hide resolved
Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few changes in this PR that aren't directly related to the stated goal of speeding up the shift operators. Please could you revert the unrelated changes, so that we can focus on the shift operations? That would make the PR easier to review and evaluate.

Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
Objects/longobject.c Outdated Show resolved Hide resolved
@xuxinhang
Copy link
Contributor Author

Thanks.
——
I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@mdickinson: please review the changes made to this pull request.

@mdickinson
Copy link
Member

@xuxinhang: Almost there. Please could you undo the changes to the medium_value function?

@xuxinhang xuxinhang force-pushed the feat/binary_shift_optimization branch from 4482861 to 241da4c Compare December 24, 2021 13:22
Copy link
Member

@mdickinson mdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Merging when the CI completes. Thank you!

@mdickinson mdickinson added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Dec 27, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @mdickinson for commit 9ed09aa 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Dec 27, 2021
@mdickinson
Copy link
Member

mdickinson commented Dec 27, 2021

Some sample timings on my machine. "main" is at commit b8de8b7. "here" is this branch at commit 9ed09aa, after a merge with main. Timings performed on an Intel MacBook Pro, macOS 10.14.6, with ./python.exe -m timeit -s "i, s = 1, 3" "i << s" or ./python.exe -m timeit -s "i, s = 1, 3" "i >> s", with suitable values of i and s in the setup.

For 1<<3:
main: 5000000 loops, best of 5: 38.2 nsec per loop
here: 10000000 loops, best of 5: 24.7 nsec per loop
55% faster

For 1234567 >> 13:
main: 5000000 loops, best of 5: 38.2 nsec per loop
here: 10000000 loops, best of 5: 24.7 nsec per loop
55% faster

(No, those numbers aren't copy and paste errors - I really did get the exact same timings.)

For 1234567 >> 123:
main: 10000000 loops, best of 5: 23.3 nsec per loop
here: 10000000 loops, best of 5: 25.3 nsec per loop
8% slower (though this seems to be within random variation)

For -1234567 >> 13:
main: 5000000 loops, best of 5: 67.5 nsec per loop
here: 10000000 loops, best of 5: 36.2 nsec per loop
86% faster

For -1234567 >> 123:
main: 5000000 loops, best of 5: 39.5 nsec per loop
here: 10000000 loops, best of 5: 25.1 nsec per loop
57% faster

@mdickinson
Copy link
Member

mdickinson commented Dec 27, 2021

The Tests / Windows (x64) (pull_request) check seems to be stuck at the build step (1h57m and counting). Closing and re-opening to re-trigger the build.

@mdickinson mdickinson closed this Dec 27, 2021
@mdickinson mdickinson reopened this Dec 27, 2021
@serhiy-storchaka
Copy link
Member

I suggest to merge #30243 first and compare this PR with a new base.

@mdickinson
Copy link
Member

Some new sample timings, now that #30243 has been merged into master, using the same machine and methodology as above. It's interesting that 1234567 >> 13 gets a significant speedup on my machine, but 12345678 >> 13 doesn't; looks like in the second case, the time is dominated by the time to construct a new PyLong for the result. (In the first case, the result is small.)

Comparing commits:
here: 59672cf
main: 360fedc

1 << 3: ~43% faster

  • here: 10000000 loops, best of 5: 26.7 nsec per loop
  • main: 10000000 loops, best of 5: 38.1 nsec per loop

1234567 >> 13: ~36% faster

  • here: 10000000 loops, best of 5: 26.6 nsec per loop
  • main: 10000000 loops, best of 5: 36.2 nsec per loop

12345678 >> 13: ~6% faster

  • here: 10000000 loops, best of 5: 34.3 nsec per loop
  • main: 10000000 loops, best of 5: 36.4 nsec per loop

-1234567 >> 13: ~77% faster

  • here: 5000000 loops, best of 5: 38.8 nsec per loop
  • main: 5000000 loops, best of 5: 68.6 nsec per loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants