improve performance of successful int extract by ~30% by avoiding calls to `index` where redundant #3742

samuelcolvin · 2024-01-14T15:41:49Z

Results from benchmarks:

extract_int_extract_success
                        time:   [5.0137 ns 5.0166 ns 5.0195 ns]
                        change: [-33.466% -32.879% -32.348%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

extract_int_extract_fail
                        time:   [190.90 ns 191.15 ns 191.41 ns]
                        change: [+0.4500% +0.9483% +1.3921%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

🚀

src/conversions/std/num.rs

codspeed-hq · 2024-01-14T16:00:19Z

CodSpeed Performance Report

Merging #3742 will degrade performances by 18.05%

_{Comparing samuelcolvin:int-extraction-performance (0e876d9) with main (7366b1a)}

Summary

⚡ 13 improvements
❌ 2 regressions
✅ 63 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`samuelcolvin:int-extraction-performance`	Change
⚡	`extract_int_downcast_fail`	266.1 ns	238.3 ns	+11.66%
❌	`extract_str_extract_success`	647.2 ns	730.6 ns	-11.41%
⚡	`extract_int_downcast_success`	902.2 ns	635.6 ns	+41.96%
⚡	`extract_int_extract_success`	893.9 ns	627.2 ns	+42.52%
⚡	`tuple_get_item`	18.7 ms	14.4 ms	+30.11%
❌	`not_a_list_via_downcast`	126.1 ns	153.9 ns	-18.05%
⚡	`tuple_get_item_unchecked`	15.5 ms	11.1 ms	+38.9%
⚡	`extract_btreeset`	87.6 ms	78.9 ms	+10.99%
⚡	`list_get_item`	22.5 ms	18.1 ms	+23.88%
⚡	`iter_list`	37.8 ms	29.1 ms	+29.76%
⚡	`iter_dict`	57.6 ms	48.9 ms	+17.72%
⚡	`iter_tuple`	27.1 ms	18.4 ms	+47.01%
⚡	`list_get_item_unchecked`	19.4 ms	15.1 ms	+28.77%
⚡	`extract_hashmap`	111.3 ms	94 ms	+18.43%
⚡	`iter_set`	60.6 ms	51.9 ms	+16.7%

davidhewitt · 2024-01-14T17:12:01Z

Nice! It looks like this code might have originated from discussion in #108. I think that logic is likely extremely dated and we're now right to simplify here.

davidhewitt · 2024-01-14T17:20:04Z

python/cpython#80229 looks relevant, which was in 3.8 so we may need to keep a different path for 3.7.

samuelcolvin · 2024-01-14T18:05:47Z

Logic changed for 3.8 to just call PyLong_AsLong, with this the performance increased to 37% improvement.

I haven't changed the extraction logic for BigInt since that is ultimately calling _PyLong_NumBits which doesn't internally call PyNumber_Index, so I think it's right - I guess it's also used much less.

src/conversions/std/num.rs

davidhewitt

The current patch looks good, though I think the same thing needs to be added also around line 70 in the int_convert_u64_or_i64 macro?

Also once that's added, please force-push to squash; GitHub merge queue doesn't let us choose to squash-merge.

src/conversions/std/num.rs

davidhewitt · 2024-01-15T11:44:53Z

Uff unfortunately the test failure looks legitimate, PyLong_AsUnsignedLong and PyLong_AsUnsignedLongLong do not call index, so we will have to manually wrap those to call __index__ again.

https://github.com/python/cpython/blob/41a94c9e7be94760baab1dcb33427d8781bea64a/Objects/longobject.c#L630C68-L630C68

samuelcolvin · 2024-01-15T12:26:52Z

turns out the benchmarks extract a lot of usizes...

davidhewitt

I guess both usize and u64 are the standard type in the benchmarks, yes 😂

This looks great to me, perhaps just can you squash it please? Then let's merge 🚀

samuelcolvin · 2024-01-15T14:35:13Z

squashed

davidhewitt · 2024-01-15T15:42:39Z

The full build suggests that 3.8 logic isn't quite right. I've added the label now so if you push further here it'll run without going to the merge queue...

samuelcolvin · 2024-01-16T10:16:11Z

Looks like this will fail, I think the lossy float conversion wasn't fixed until 3.10 in python/cpython#82180.

add newsfragment formatting skip slow path on 3.8+ formatting cfg if,else formatting again dedicated macro, change int_convert_u64_or_i64 too add float tests force index call for PyLong_AsUnsignedLongLong perform PyLong check for 3.8 too perform PyLong check for <3.10

davidhewitt · 2024-01-16T15:40:40Z

Ok great; so now we can do the fast-path on 3.10+ instead. Makes sense.

samuelcolvin · 2024-01-16T17:36:09Z

I'm assuming the segmentation fault with numpy on the merge queue is not something for me to fix?

davidhewitt · 2024-01-16T19:05:10Z

Nope; rerun succeeded. There is some known flakiness on PyPy that I believe to be between them and pytest, we're just an unhappy casualty of it.

alex reviewed Jan 14, 2024

View reviewed changes

src/conversions/std/num.rs Outdated Show resolved Hide resolved

samuelcolvin commented Jan 14, 2024

View reviewed changes

src/conversions/std/num.rs Outdated Show resolved Hide resolved

samuelcolvin mentioned this pull request Jan 14, 2024

Int extraction pydantic/pydantic-core#1155

Merged

samuelcolvin force-pushed the int-extraction-performance branch from 4a97ccb to 206fb59 Compare January 14, 2024 18:03

adamreichold reviewed Jan 14, 2024

View reviewed changes

src/conversions/std/num.rs Outdated Show resolved Hide resolved

davidhewitt requested changes Jan 15, 2024

View reviewed changes

davidhewitt reviewed Jan 15, 2024

View reviewed changes

src/conversions/std/num.rs Show resolved Hide resolved

samuelcolvin force-pushed the int-extraction-performance branch from 6b02537 to d13e735 Compare January 15, 2024 12:02

samuelcolvin changed the title ~~improve performance of successful int extract by ~30%~~ improve performance of successful int extract by ~30% by avoiding calls to __index__ where redundant Jan 15, 2024

davidhewitt approved these changes Jan 15, 2024

View reviewed changes

samuelcolvin force-pushed the int-extraction-performance branch from d13e735 to 8652664 Compare January 15, 2024 14:32

davidhewitt enabled auto-merge January 15, 2024 14:32

davidhewitt added this pull request to the merge queue Jan 15, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 15, 2024

davidhewitt added the CI-build-full label Jan 15, 2024

samuelcolvin force-pushed the int-extraction-performance branch from 716d472 to 38c96d9 Compare January 16, 2024 13:49

samuelcolvin force-pushed the int-extraction-performance branch from 38c96d9 to 0e876d9 Compare January 16, 2024 13:51

davidhewitt added this pull request to the merge queue Jan 16, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 16, 2024

davidhewitt added this pull request to the merge queue Jan 16, 2024

Merged via the queue into PyO3:main with commit 43504cd Jan 16, 2024
66 of 67 checks passed

samuelcolvin mentioned this pull request Mar 15, 2024

Parsing of float as datetime changes depending on the deprecation warnings handling setting pydantic/pydantic#9018

Closed

1 task

davidhewitt mentioned this pull request May 14, 2024

defer to PyO3 i64 extraction to avoid implicit integer casts pydantic/pydantic-core#1288

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve performance of successful int extract by ~30% by avoiding calls to `index` where redundant #3742

improve performance of successful int extract by ~30% by avoiding calls to `index` where redundant #3742

samuelcolvin commented Jan 14, 2024

codspeed-hq bot commented Jan 14, 2024 •

edited

Loading

davidhewitt commented Jan 14, 2024

davidhewitt commented Jan 14, 2024

samuelcolvin commented Jan 14, 2024

davidhewitt left a comment

davidhewitt commented Jan 15, 2024

samuelcolvin commented Jan 15, 2024

davidhewitt left a comment

samuelcolvin commented Jan 15, 2024

davidhewitt commented Jan 15, 2024

samuelcolvin commented Jan 16, 2024

davidhewitt commented Jan 16, 2024

samuelcolvin commented Jan 16, 2024

davidhewitt commented Jan 16, 2024

improve performance of successful int extract by ~30% by avoiding calls to __index__ where redundant #3742

improve performance of successful int extract by ~30% by avoiding calls to __index__ where redundant #3742

Conversation

samuelcolvin commented Jan 14, 2024

codspeed-hq bot commented Jan 14, 2024 • edited Loading

CodSpeed Performance Report

Merging #3742 will degrade performances by 18.05%

Summary

Benchmarks breakdown

davidhewitt commented Jan 14, 2024

davidhewitt commented Jan 14, 2024

samuelcolvin commented Jan 14, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt commented Jan 15, 2024

samuelcolvin commented Jan 15, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

samuelcolvin commented Jan 15, 2024

davidhewitt commented Jan 15, 2024

samuelcolvin commented Jan 16, 2024

davidhewitt commented Jan 16, 2024

samuelcolvin commented Jan 16, 2024

davidhewitt commented Jan 16, 2024

improve performance of successful int extract by ~30% by avoiding calls to `index` where redundant #3742

improve performance of successful int extract by ~30% by avoiding calls to `index` where redundant #3742

codspeed-hq bot commented Jan 14, 2024 •

edited

Loading