Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance of successful int extract by ~30% by avoiding calls to __index__ where redundant #3742

Merged
merged 1 commit into from
Jan 16, 2024

Conversation

samuelcolvin
Copy link
Contributor

Results from benchmarks:

extract_int_extract_success
                        time:   [5.0137 ns 5.0166 ns 5.0195 ns]
                        change: [-33.466% -32.879% -32.348%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

extract_int_extract_fail
                        time:   [190.90 ns 191.15 ns 191.41 ns]
                        change: [+0.4500% +0.9483% +1.3921%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

🚀

src/conversions/std/num.rs Outdated Show resolved Hide resolved
Copy link

codspeed-hq bot commented Jan 14, 2024

CodSpeed Performance Report

Merging #3742 will degrade performances by 18.05%

Comparing samuelcolvin:int-extraction-performance (0e876d9) with main (7366b1a)

Summary

⚡ 13 improvements
❌ 2 regressions
✅ 63 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main samuelcolvin:int-extraction-performance Change
extract_int_downcast_fail 266.1 ns 238.3 ns +11.66%
extract_str_extract_success 647.2 ns 730.6 ns -11.41%
extract_int_downcast_success 902.2 ns 635.6 ns +41.96%
extract_int_extract_success 893.9 ns 627.2 ns +42.52%
tuple_get_item 18.7 ms 14.4 ms +30.11%
not_a_list_via_downcast 126.1 ns 153.9 ns -18.05%
tuple_get_item_unchecked 15.5 ms 11.1 ms +38.9%
extract_btreeset 87.6 ms 78.9 ms +10.99%
list_get_item 22.5 ms 18.1 ms +23.88%
iter_list 37.8 ms 29.1 ms +29.76%
iter_dict 57.6 ms 48.9 ms +17.72%
iter_tuple 27.1 ms 18.4 ms +47.01%
list_get_item_unchecked 19.4 ms 15.1 ms +28.77%
extract_hashmap 111.3 ms 94 ms +18.43%
iter_set 60.6 ms 51.9 ms +16.7%

@davidhewitt
Copy link
Member

Nice! It looks like this code might have originated from discussion in #108. I think that logic is likely extremely dated and we're now right to simplify here.

@davidhewitt
Copy link
Member

python/cpython#80229 looks relevant, which was in 3.8 so we may need to keep a different path for 3.7.

@samuelcolvin
Copy link
Contributor Author

Logic changed for 3.8 to just call PyLong_AsLong, with this the performance increased to 37% improvement.

I haven't changed the extraction logic for BigInt since that is ultimately calling _PyLong_NumBits which doesn't internally call PyNumber_Index, so I think it's right - I guess it's also used much less.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current patch looks good, though I think the same thing needs to be added also around line 70 in the int_convert_u64_or_i64 macro?

Also once that's added, please force-push to squash; GitHub merge queue doesn't let us choose to squash-merge.

@davidhewitt
Copy link
Member

Uff unfortunately the test failure looks legitimate, PyLong_AsUnsignedLong and PyLong_AsUnsignedLongLong do not call index, so we will have to manually wrap those to call __index__ again.

https://github.com/python/cpython/blob/41a94c9e7be94760baab1dcb33427d8781bea64a/Objects/longobject.c#L630C68-L630C68

@samuelcolvin samuelcolvin force-pushed the int-extraction-performance branch from 6b02537 to d13e735 Compare January 15, 2024 12:02
@samuelcolvin samuelcolvin changed the title improve performance of successful int extract by ~30% improve performance of successful int extract by ~30% by avoiding calls to __index__ where redundant Jan 15, 2024
@samuelcolvin
Copy link
Contributor Author

turns out the benchmarks extract a lot of usizes...

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess both usize and u64 are the standard type in the benchmarks, yes 😂

This looks great to me, perhaps just can you squash it please? Then let's merge 🚀

@samuelcolvin samuelcolvin force-pushed the int-extraction-performance branch from d13e735 to 8652664 Compare January 15, 2024 14:32
@davidhewitt davidhewitt enabled auto-merge January 15, 2024 14:32
@samuelcolvin
Copy link
Contributor Author

squashed

@davidhewitt davidhewitt added this pull request to the merge queue Jan 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 15, 2024
@davidhewitt
Copy link
Member

The full build suggests that 3.8 logic isn't quite right. I've added the label now so if you push further here it'll run without going to the merge queue...

@samuelcolvin
Copy link
Contributor Author

Looks like this will fail, I think the lossy float conversion wasn't fixed until 3.10 in python/cpython#82180.

@samuelcolvin samuelcolvin force-pushed the int-extraction-performance branch from 716d472 to 38c96d9 Compare January 16, 2024 13:49
add newsfragment

formatting

skip slow path on 3.8+

formatting

cfg if,else

formatting again

dedicated macro, change int_convert_u64_or_i64 too

add float tests

force index call for PyLong_AsUnsignedLongLong

perform PyLong check for 3.8 too

perform PyLong check for <3.10
@samuelcolvin samuelcolvin force-pushed the int-extraction-performance branch from 38c96d9 to 0e876d9 Compare January 16, 2024 13:51
@davidhewitt
Copy link
Member

Ok great; so now we can do the fast-path on 3.10+ instead. Makes sense.

@davidhewitt davidhewitt added this pull request to the merge queue Jan 16, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 16, 2024
@davidhewitt davidhewitt added this pull request to the merge queue Jan 16, 2024
@samuelcolvin
Copy link
Contributor Author

I'm assuming the segmentation fault with numpy on the merge queue is not something for me to fix?

Merged via the queue into PyO3:main with commit 43504cd Jan 16, 2024
66 of 67 checks passed
@davidhewitt
Copy link
Member

Nope; rerun succeeded. There is some known flakiness on PyPy that I believe to be between them and pytest, we're just an unhappy casualty of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants