Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LoongArch SX SIMD extenstion implementation #981

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

lrzlin
Copy link

@lrzlin lrzlin commented Nov 28, 2024

Add LoongArch SX SIMD extension support to xxHash, which is a new RISC ISA just like RISC-V. The code itself is basically a rewritten of the SSE2 one, tested on Loongson 3A6000 processor, here are the results.

Scalar

./xxhsum --benchmark-all
xxhsum 0.8.3 by Yann Collet
compiled as 64-bit unknown little endian with GCC 14.1.1 20240720
Sample of 100 KB...
 1#XXH32                         :     102400 ->    60583 it/s ( 5916.3 MB/s)
 2#XXH32 unaligned               :     102400 ->    58909 it/s ( 5752.8 MB/s)
 3#XXH64                         :     102400 ->   117056 it/s (11431.2 MB/s)
 4#XXH64 unaligned               :     102400 ->   118304 it/s (11553.1 MB/s)
 5#XXH3_64b                      :     102400 ->   142790 it/s (13944.4 MB/s)
 6#XXH3_64b unaligned            :     102400 ->   136549 it/s (13334.9 MB/s)
 7#XXH3_64b w/seed               :     102400 ->   117327 it/s (11457.8 MB/s)
 8#XXH3_64b w/seed unaligned     :     102400 ->   114720 it/s (11203.1 MB/s)
 9#XXH3_64b w/secret             :     102400 ->   144445 it/s (14106.0 MB/s)
10#XXH3_64b w/secret unaligned   :     102400 ->   138076 it/s (13483.9 MB/s)
11#XXH128                        :     102400 ->   114568 it/s (11188.3 MB/s)
12#XXH128 unaligned              :     102400 ->   111621 it/s (10900.5 MB/s)
13#XXH128 w/seed                 :     102400 ->   144003 it/s (14062.8 MB/s)
14#XXH128 w/seed unaligned       :     102400 ->   137411 it/s (13419.1 MB/s)
15#XXH128 w/secret               :     102400 ->   151320 it/s (14777.3 MB/s)
16#XXH128 w/secret unaligned     :     102400 ->   143261 it/s (13990.4 MB/s)
17#XXH32_stream                  :     102400 ->    60783 it/s ( 5935.8 MB/s)
18#XXH32_stream unaligned        :     102400 ->    56481 it/s ( 5515.7 MB/s)
19#XXH64_stream                  :     102400 ->   116072 it/s (11335.1 MB/s)
20#XXH64_stream unaligned        :     102400 ->   105525 it/s (10305.2 MB/s)
21#XXH3_stream                   :     102400 ->   116267 it/s (11354.2 MB/s)
22#XXH3_stream unaligned         :     102400 ->   112994 it/s (11034.6 MB/s)
23#XXH3_stream w/seed            :     102400 ->   115983 it/s (11326.5 MB/s)
24#XXH3_stream w/seed unaligned  :     102400 ->   112852 it/s (11020.7 MB/s)
25#XXH128_stream                 :     102400 ->   116232 it/s (11350.8 MB/s)
26#XXH128_stream unaligned       :     102400 ->   112905 it/s (11025.8 MB/s)
27#XXH128_stream w/seed          :     102400 ->   116081 it/s (11336.1 MB/s)
28#XXH128_stream w/seed unaligne :     102400 ->   112789 it/s (11014.6 MB/s)

LoongArch SX

./xxhsum --benchmark-all
xxhsum 0.8.3 by Yann Collet
compiled as 64-bit loongarch little endian with GCC 14.1.1 20240720
Sample of 100 KB...
 1#XXH32                         :     102400 ->    60569 it/s ( 5915.0 MB/s)
 2#XXH32 unaligned               :     102400 ->    58907 it/s ( 5752.6 MB/s)
 3#XXH64                         :     102400 ->   117053 it/s (11431.0 MB/s)
 4#XXH64 unaligned               :     102400 ->   118289 it/s (11551.6 MB/s)
 5#XXH3_64b                      :     102400 ->   199076 it/s (19441.0 MB/s)
 6#XXH3_64b unaligned            :     102400 ->   189993 it/s (18554.0 MB/s)
 7#XXH3_64b w/seed               :     102400 ->   199722 it/s (19504.1 MB/s)
 8#XXH3_64b w/seed unaligned     :     102400 ->   189672 it/s (18522.7 MB/s)
 9#XXH3_64b w/secret             :     102400 ->   183999 it/s (17968.7 MB/s)
10#XXH3_64b w/secret unaligned   :     102400 ->   176080 it/s (17195.3 MB/s)
11#XXH128                        :     102400 ->   198883 it/s (19422.2 MB/s)
12#XXH128 unaligned              :     102400 ->   189925 it/s (18547.3 MB/s)
13#XXH128 w/seed                 :     102400 ->   198877 it/s (19421.6 MB/s)
14#XXH128 w/seed unaligned       :     102400 ->   188807 it/s (18438.2 MB/s)
15#XXH128 w/secret               :     102400 ->   183770 it/s (17946.3 MB/s)
16#XXH128 w/secret unaligned     :     102400 ->   175811 it/s (17169.0 MB/s)
17#XXH32_stream                  :     102400 ->    60782 it/s ( 5935.7 MB/s)
18#XXH32_stream unaligned        :     102400 ->    56488 it/s ( 5516.4 MB/s)
19#XXH64_stream                  :     102400 ->   116075 it/s (11335.4 MB/s)
20#XXH64_stream unaligned        :     102400 ->   105448 it/s (10297.7 MB/s)
21#XXH3_stream                   :     102400 ->   199438 it/s (19476.4 MB/s)
22#XXH3_stream unaligned         :     102400 ->   190148 it/s (18569.2 MB/s)
23#XXH3_stream w/seed            :     102400 ->   198714 it/s (19405.7 MB/s)
24#XXH3_stream w/seed unaligned  :     102400 ->   189577 it/s (18513.3 MB/s)
25#XXH128_stream                 :     102400 ->   199140 it/s (19447.3 MB/s)
26#XXH128_stream unaligned       :     102400 ->   190063 it/s (18560.8 MB/s)
27#XXH128_stream w/seed          :     102400 ->   198738 it/s (19408.0 MB/s)
28#XXH128_stream w/seed unaligne :     102400 ->   189668 it/s (18522.3 MB/s)

Reference
LoongArch Intrinsics Document: https://jia.je/unofficial-loongarch-intrinsics-guide/

Resolve
loongson-community/discussions#72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant