You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.
In #196 there was a discussion on what shuffle patterns get accelerated by hardware on various platforms. Right now those are handled inconsistently by the toolchain, and relgardless of what remedy we pick it is good to know what gets accelerated and what does not.
Some other suggestions for potentially useful patterns, described as lane indices of the resulting 32-bit lanes:
2301 (swapping 32-bit pairs), 1032 (swapping 64-bit pairs), 0321/2103 (rotate right/left), 0123 (reverse). All but the last are currently used within JPEG XL.
Thanks @penzn for filing - I'm guessing the purpose of this is currently for documenting known fast shuffles? I'm marking this with a documentation label till #196 is resolved to keep the bulk of the discussion regarding shuffles there.
In #196 there was a discussion on what shuffle patterns get accelerated by hardware on various platforms. Right now those are handled inconsistently by the toolchain, and relgardless of what remedy we pick it is good to know what gets accelerated and what does not.
Tentative list from #196 (comment) and #196 (comment):
@zeux, thank you for your list.
The text was updated successfully, but these errors were encountered: