Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize less-than #264

Merged
merged 1 commit into from
Mar 15, 2022
Merged

Optimize less-than #264

merged 1 commit into from
Mar 15, 2022

Conversation

chfast
Copy link
Owner

@chfast chfast commented Mar 15, 2022

The new implementation is similar to what was there: we select which half to compare and do 128-bit less-than. For small values this is equivalent to do subtraction + borrow check, for big values though we get 25% boost.

@chfast
Copy link
Owner Author

chfast commented Mar 15, 2022

Clang

Comparing o/compare to o/compare-lt
Benchmark                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------
compare<lt_public>/64                  -0.1246         -0.1246             2             2             2             2
compare<lt_public>/128                 -0.1231         -0.1231             2             2             2             2
compare<lt_public>/192                 -0.3112         -0.3112             2             2             2             2
compare<lt_public>/256                 -0.3114         -0.3114             2             2             2             2
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
compare<lt_public>/64         2.37 ns         2.37 ns    591687936
compare<lt_public>/128        2.36 ns         2.36 ns    592440064
compare<lt_public>/192        2.36 ns         2.36 ns    591543552
compare<lt_public>/256        2.37 ns         2.37 ns    592008704
compare<lt_wordcmp>/64        2.10 ns         2.10 ns    669979392
compare<lt_wordcmp>/128       1.87 ns         1.87 ns    736609024
compare<lt_wordcmp>/192       4.13 ns         4.13 ns    339005440
compare<lt_wordcmp>/256       3.12 ns         3.12 ns    454742528
compare<lt_halves>/64         2.36 ns         2.36 ns    592504320
compare<lt_halves>/128        2.37 ns         2.37 ns    592627968
compare<lt_halves>/192        2.36 ns         2.36 ns    592624896
compare<lt_halves>/256        2.36 ns         2.36 ns    593662208
compare<lt_llvm>/64           2.03 ns         2.03 ns    688150016
compare<lt_llvm>/128          2.04 ns         2.04 ns    688088576
compare<lt_llvm>/192          2.05 ns         2.05 ns    688033792
compare<lt_llvm>/256          2.04 ns         2.04 ns    688006912

GCC

Comparing o/compare to o/compare-lt
Benchmark                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------
compare<lt_public>/64                  -0.1671         -0.1671             3             2             3             2
compare<lt_public>/128                 -0.1650         -0.1650             3             2             3             2
compare<lt_public>/192                 -0.2578         -0.2578             3             2             3             2
compare<lt_public>/256                 -0.2577         -0.2577             3             2             3             2
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
compare<lt_public>/64         2.09 ns         2.09 ns    668606208
compare<lt_public>/128        2.10 ns         2.10 ns    668350720
compare<lt_public>/192        1.87 ns         1.87 ns    750063616
compare<lt_public>/256        1.87 ns         1.87 ns    750004224
compare<lt_sub>/64            3.51 ns         3.51 ns    398873856
compare<lt_sub>/128           3.51 ns         3.51 ns    398906880
compare<lt_sub>/192           3.52 ns         3.52 ns    398810624
compare<lt_sub>/256           3.51 ns         3.51 ns    398874880
compare<lt_wordcmp>/64        2.09 ns         2.09 ns    669849344
compare<lt_wordcmp>/128       5.00 ns         5.00 ns    278123008
compare<lt_wordcmp>/192       5.28 ns         5.28 ns    264916224
compare<lt_wordcmp>/256       5.24 ns         5.24 ns    264640768
compare<lt_halves>/64         2.55 ns         2.55 ns    552285696
compare<lt_halves>/128        2.53 ns         2.53 ns    552007936
compare<lt_halves>/192        2.54 ns         2.54 ns    552038656
compare<lt_halves>/256        2.54 ns         2.54 ns    552642560

The new implementation is similar to what was there: we select which
half to compare and do 128-bit less-than. For small values this is
equivalent to do subtraction + borrow check, for big values though we
get 25% boost.
@codecov-commenter
Copy link

Codecov Report

Merging #264 (f4cff75) into master (ce3f7a1) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #264   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        10           
  Lines         1911      1912    +1     
=========================================
+ Hits          1911      1912    +1     
Flag Coverage Δ
32bit 100.00% <100.00%> (ø)
gcc 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
include/intx/intx.hpp 100.00% <100.00%> (ø)

@chfast chfast merged commit 69ed321 into master Mar 15, 2022
@chfast chfast deleted the lt_opt branch March 15, 2022 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants