Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize full accumulating loop for SVE #756

Merged
merged 2 commits into from
Nov 24, 2022
Merged

Conversation

hzhuang1
Copy link
Contributor

With the patch, performance is improved at least 2 times.

@Cyan4973
Copy link
Owner

Cyan4973 commented Nov 10, 2022

note : it seems that Github Actions tests have not even started on this PR ?

edit : wait, it seems this is not directly related to this PR in particular, the absence of Github Actions results predates it, it's also visible in previous PR.

Let's investigate ...

@t-mat
Copy link
Contributor

t-mat commented Nov 10, 2022

note : it seems that Github Actions tests have not even started on this PR ?

Ugh. I've checked the latest GitHub actions log. And it reports there's an error in our ci.yml

https://github.com/Cyan4973/xxHash/actions/runs/3426646597

image

https://github.com/Cyan4973/xxHash/actions/runs/3426646597/workflow#L566

It's been introduced at 058a465 which is basically follows #744 (comment)

edit : Create a new issue for this problem : #757

@hzhuang1
Copy link
Contributor Author

hzhuang1 commented Nov 10, 2022 via email

@t-mat
Copy link
Contributor

t-mat commented Nov 10, 2022

Is there any a quick fix on this?

Commands under - run: | should have some spaces. 👾👾

I mean the following line

      - run: |
        mkdir -p /usr/local/share/.tipi
        # FIX: Hack for github action
        git config --global --add safe.directory /usr/local/share/.tipi
        git config --global --add safe.directory /__w/xxHash/xxHash/

should be fixed as this:

      - run: |
          mkdir -p /usr/local/share/.tipi
          # FIX: Hack for github action
          git config --global --add safe.directory /usr/local/share/.tipi
          git config --global --add safe.directory /__w/xxHash/xxHash/

Since I'm moving to my hometown, I can't edit the code.
Could you create a new PR to fix this ?

@hzhuang1
Copy link
Contributor Author

Is there any a quick fix on this?

Commands under - run: | should have some spaces. 👾👾

I mean the following line

      - run: |
        mkdir -p /usr/local/share/.tipi
        # FIX: Hack for github action
        git config --global --add safe.directory /usr/local/share/.tipi
        git config --global --add safe.directory /__w/xxHash/xxHash/

should be fixed as this:

      - run: |
          mkdir -p /usr/local/share/.tipi
          # FIX: Hack for github action
          git config --global --add safe.directory /usr/local/share/.tipi
          git config --global --add safe.directory /__w/xxHash/xxHash/

Since I'm moving to my hometown, I can't edit the code. Could you create a new PR to fix this ?

Oh, two more spaces are needed. No problem. I'll submit a pull request right now.

@Cyan4973
Copy link
Owner

Great debugging @t-mat !

@hzhuang1 hzhuang1 force-pushed the sve_02 branch 2 times, most recently from 9379c2e to e400b9e Compare November 10, 2022 23:06
XXH3_accumulate() handle the whole accumulating loop and architecture
optimized code is in the mini loop of 512 bytes. But it also causes
accessing memory frequently for the large block data.

Now make XXH3_accumulate() as architecture optimized code.

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Signed-off-by: Devin Hussey <easyaspi314@users.noreply.github.com>
With optimized full accumulating loop, the performance is improved
at least 2 times.

The ACC result needn't to save to stack in the full loop. And
instructions of prefetching data for SVE are also used.

Without this patch, the performance result is in below.
 ===  benchmarking 4 hash functions  ===
benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27)
xxh3   ,  1904,  2315,  2468,  2580,  2640,  2670,  2682,  2673,  2677,  2663,  2683,  2688,  2686,  2591,  2241,  2181,  2191,  2048,  2048
XXH32  ,  1326,  1440,  1493,  1523,  1534,  1543,  1547,  1532,  1504,  1507,  1507,  1505,  1506,  1446,  1218,  1150,  1151,  1153,  1135
XXH64  ,  2511,  2795,  2975,  3068,  3120,  3125,  3154,  3128,  3034,  3045,  3052,  3053,  3053,  2842,  2050,  1853,  1848,  1853,  1853
XXH128 ,  1867,  2294,  2465,  2569,  2622,  2662,  2676,  2667,  2677,  2682,  2684,  2677,  2683,  2570,  2093,  2013,  2045,  2046,  2046

With this patch, the performance result is in below.
 ===  benchmarking 4 hash functions  ===
benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27)
xxh3   ,  3681,  6007,  7803,  8954,  9875, 10411, 10703, 10505, 10670, 10794, 10812, 10804, 10205,  9923,  6279,  5927,  5967,  6022,  6062
XXH32  ,  1281,  1434,  1494,  1523,  1534,  1543,  1547,  1535,  1500,  1502,  1502,  1502,  1501,  1443,  1242,  1169,  1193,  1196,  1195
XXH64  ,  2497,  2801,  2961,  3074,  3092,  3136,  3155,  3123,  3031,  3037,  3040,  3037,  3033,  2847,  2102,  1955,  1967,  1974,  1971
XXH128 ,  3419,  5798,  7488,  8854,  9787, 10357, 10673, 10468, 10647, 10748, 10785, 10751, 10805,  9698,  6011,  5677,  5999,  6065,  6074

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
Signed-off-by: Devin Hussey <easyaspi314@users.noreply.github.com>
@hzhuang1
Copy link
Contributor Author

OK. All checks have passed now.

@hzhuang1
Copy link
Contributor Author

How about this patch set? :)

@hzhuang1
Copy link
Contributor Author

Should I do anything for this pull request? Thanks

@Cyan4973
Copy link
Owner

Sorry @hzhuang1 , I just needed some available time to properly review the code change.

I believe it's good, no modification requested.

@hzhuang1
Copy link
Contributor Author

Thanks a lot.

@Cyan4973 Cyan4973 merged commit 30d6a3e into Cyan4973:dev Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants