Customize full accumulating loop for SVE #756

hzhuang1 · 2022-11-10T01:31:12Z

With the patch, performance is improved at least 2 times.

Cyan4973 · 2022-11-10T02:01:53Z

note : it seems that Github Actions tests have not even started on this PR ?

edit : wait, it seems this is not directly related to this PR in particular, the absence of Github Actions results predates it, it's also visible in previous PR.

Let's investigate ...

t-mat · 2022-11-10T02:11:28Z

note : it seems that Github Actions tests have not even started on this PR ?

Ugh. I've checked the latest GitHub actions log. And it reports there's an error in our ci.yml

https://github.com/Cyan4973/xxHash/actions/runs/3426646597

https://github.com/Cyan4973/xxHash/actions/runs/3426646597/workflow#L566

It's been introduced at 058a465 which is basically follows #744 (comment)

edit : Create a new issue for this problem : #757

hzhuang1 · 2022-11-10T02:13:17Z

Oh, it’s too bad. I just copy the actions from Yannic. Is there any a quick fix on this?

t-mat · 2022-11-10T02:20:26Z

Is there any a quick fix on this?

Commands under - run: | should have some spaces. 👾👾

I mean the following line

      - run: |
        mkdir -p /usr/local/share/.tipi
        # FIX: Hack for github action
        git config --global --add safe.directory /usr/local/share/.tipi
        git config --global --add safe.directory /__w/xxHash/xxHash/

should be fixed as this:

      - run: |
          mkdir -p /usr/local/share/.tipi
          # FIX: Hack for github action
          git config --global --add safe.directory /usr/local/share/.tipi
          git config --global --add safe.directory /__w/xxHash/xxHash/

Since I'm moving to my hometown, I can't edit the code.
Could you create a new PR to fix this ?

hzhuang1 · 2022-11-10T02:24:31Z

Is there any a quick fix on this?

Commands under - run: | should have some spaces. 👾👾

I mean the following line

      - run: |
        mkdir -p /usr/local/share/.tipi
        # FIX: Hack for github action
        git config --global --add safe.directory /usr/local/share/.tipi
        git config --global --add safe.directory /__w/xxHash/xxHash/

should be fixed as this:

      - run: |
          mkdir -p /usr/local/share/.tipi
          # FIX: Hack for github action
          git config --global --add safe.directory /usr/local/share/.tipi
          git config --global --add safe.directory /__w/xxHash/xxHash/

Since I'm moving to my hometown, I can't edit the code. Could you create a new PR to fix this ?

Oh, two more spaces are needed. No problem. I'll submit a pull request right now.

Cyan4973 · 2022-11-10T05:18:27Z

Great debugging @t-mat !

XXH3_accumulate() handle the whole accumulating loop and architecture optimized code is in the mini loop of 512 bytes. But it also causes accessing memory frequently for the large block data. Now make XXH3_accumulate() as architecture optimized code. Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Devin Hussey <easyaspi314@users.noreply.github.com>

With optimized full accumulating loop, the performance is improved at least 2 times. The ACC result needn't to save to stack in the full loop. And instructions of prefetching data for SVE are also used. Without this patch, the performance result is in below. === benchmarking 4 hash functions === benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) xxh3 , 1904, 2315, 2468, 2580, 2640, 2670, 2682, 2673, 2677, 2663, 2683, 2688, 2686, 2591, 2241, 2181, 2191, 2048, 2048 XXH32 , 1326, 1440, 1493, 1523, 1534, 1543, 1547, 1532, 1504, 1507, 1507, 1505, 1506, 1446, 1218, 1150, 1151, 1153, 1135 XXH64 , 2511, 2795, 2975, 3068, 3120, 3125, 3154, 3128, 3034, 3045, 3052, 3053, 3053, 2842, 2050, 1853, 1848, 1853, 1853 XXH128 , 1867, 2294, 2465, 2569, 2622, 2662, 2676, 2667, 2677, 2682, 2684, 2677, 2683, 2570, 2093, 2013, 2045, 2046, 2046 With this patch, the performance result is in below. === benchmarking 4 hash functions === benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) xxh3 , 3681, 6007, 7803, 8954, 9875, 10411, 10703, 10505, 10670, 10794, 10812, 10804, 10205, 9923, 6279, 5927, 5967, 6022, 6062 XXH32 , 1281, 1434, 1494, 1523, 1534, 1543, 1547, 1535, 1500, 1502, 1502, 1502, 1501, 1443, 1242, 1169, 1193, 1196, 1195 XXH64 , 2497, 2801, 2961, 3074, 3092, 3136, 3155, 3123, 3031, 3037, 3040, 3037, 3033, 2847, 2102, 1955, 1967, 1974, 1971 XXH128 , 3419, 5798, 7488, 8854, 9787, 10357, 10673, 10468, 10647, 10748, 10785, 10751, 10805, 9698, 6011, 5677, 5999, 6065, 6074 Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Devin Hussey <easyaspi314@users.noreply.github.com>

hzhuang1 · 2022-11-10T23:33:30Z

OK. All checks have passed now.

hzhuang1 · 2022-11-18T08:35:37Z

How about this patch set? :)

hzhuang1 · 2022-11-23T02:27:37Z

Should I do anything for this pull request? Thanks

Cyan4973 · 2022-11-24T00:28:13Z

Sorry @hzhuang1 , I just needed some available time to properly review the code change.

I believe it's good, no modification requested.

hzhuang1 · 2022-11-24T07:36:11Z

Thanks a lot.

This was referenced Nov 10, 2022

Implement ARM SVE optimization with assembly code #751

Closed

Full acc loop #744

Closed

hzhuang1 force-pushed the sve_02 branch 2 times, most recently from 9379c2e to e400b9e Compare November 10, 2022 23:06

hzhuang1 force-pushed the sve_02 branch from e400b9e to 91788f1 Compare November 10, 2022 23:13

Cyan4973 approved these changes Nov 24, 2022

View reviewed changes

Cyan4973 merged commit 30d6a3e into Cyan4973:dev Nov 24, 2022

hzhuang1 mentioned this pull request Nov 24, 2022

Change dispatch breakpoint to XXH3_accumulate() #692

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customize full accumulating loop for SVE #756

Customize full accumulating loop for SVE #756

hzhuang1 commented Nov 10, 2022

Cyan4973 commented Nov 10, 2022 •

edited

Loading

t-mat commented Nov 10, 2022 •

edited

Loading

hzhuang1 commented Nov 10, 2022 via email •

edited

Loading

t-mat commented Nov 10, 2022

hzhuang1 commented Nov 10, 2022

Cyan4973 commented Nov 10, 2022

hzhuang1 commented Nov 10, 2022

hzhuang1 commented Nov 18, 2022

hzhuang1 commented Nov 23, 2022

Cyan4973 commented Nov 24, 2022

hzhuang1 commented Nov 24, 2022

Customize full accumulating loop for SVE #756

Customize full accumulating loop for SVE #756

Conversation

hzhuang1 commented Nov 10, 2022

Cyan4973 commented Nov 10, 2022 • edited Loading

t-mat commented Nov 10, 2022 • edited Loading

hzhuang1 commented Nov 10, 2022 via email • edited Loading

t-mat commented Nov 10, 2022

hzhuang1 commented Nov 10, 2022

Cyan4973 commented Nov 10, 2022

hzhuang1 commented Nov 10, 2022

hzhuang1 commented Nov 18, 2022

hzhuang1 commented Nov 23, 2022

Cyan4973 commented Nov 24, 2022

hzhuang1 commented Nov 24, 2022

Cyan4973 commented Nov 10, 2022 •

edited

Loading

t-mat commented Nov 10, 2022 •

edited

Loading

hzhuang1 commented Nov 10, 2022 via email •

edited

Loading