-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding AVX512 path to Base64 encoding/Decoding #92241
Conversation
@BruceForstall @tannergooding @dotnet/avx512-contrib |
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Encoder.cs
Outdated
Show resolved
Hide resolved
8b593f6
to
d29d4d0
Compare
Do we need a new entry in the third party notices file? |
We should already have them if I am not mistaken (unless avx512 uses a different article) |
I'm not sure about this- The original sse and avx implementations use a similar algorithm and this is the reference provided - https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Encoder.cs#L13-L14. This implementation(the VBMI version) uses a modified version based on this(https://github.com/dotnet/runtime/pull/92241/files#diff-db463201901c2d83d2b563871ae11fafee9d5afe94e4d014b77212996b25f770R635) - https://arxiv.org/pdf/1910.05109.pdf There is an AVX512 based implementation in there - https://github.com/aklomp/base64/tree/master/lib/arch/avx512. But it has only the encode and it requires 'multishift' which we do not currently support |
We generally acknowledge significant reuse in the TPN file even if there's a link from the sources. I see runtime/THIRD-PARTY-NOTICES.TXT Line 345 in 17b60e3
but, whatever @EgorBo recommends.. |
I am not an expert in
Do we use any of the code from that repo? I see it has BSD-2 license |
Not directly but the logic(including shuffle constants are similar). But this implementation uses '_mm512_multishift_epi64_epi8' and that's not the one I'm using. This(https://github.com/WojciechMula/base64simd/blob/master/encode/encode.avx512vbmi.cpp) in particular is probably worth mentioning in hindsight(Since I used the non multishift version in my implementation with VBMI) I used the below as resources to understand available implementations and available options(But they are all related). Which brings up the question which of these exactly I should be referencing esp this for understanding : http://0x80.pl/notesen/2016-04-03-avx512-base64.html#id29 https://github.com/WojciechMula/base64simd/tree/master |
Tagging subscribers to this area: @dotnet/area-system-buffers Issue DetailsOverviewThis PR implements an AVX512 code path for Base64 encoding/Decoding. This is based on the work by Wojciech Muła and Daniel Lemire. There is a fast This version uses intrinsics directly and not generic vector libraries due to lack of current support in JIt/Vector libraries to produce optimal code. Some additional support which would be required in order to use generic vector library for implementing this would be
Even the current implementation can be further optimized by adding the Generated code(Will be focusing on the actual encoding/decoding code within the loop only) EncodingVBMI_AVAILABLE VBMI_UNAVAILABLE DecodingVBMI_AVAILABLE VBMI_UNAVAILABLE PerformanceON ICX -BASE_VERSION vs VBMI_UNAVAILABLE BASE_VERSION vs VBMI_AVAILABLE
|
@EgorBo I modified the reference to point to https://github.com/WojciechMula/base64simd/tree/master. Which has the closest versions to the implementation I went with. Will this require me adding anything to notice? |
Have updated THIRD PARTY NOTICE based on conversation with @tannergooding . Removing fallback avx512Bw path meant I had to add only 2 references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks way simpler now
@tannergooding @BruceForstall Any comments on this? I'd be great if we can move this forward this week, |
I'm not the right person to review this. If @tannergooding can't review, maybe @stephentoub can review (or pick an appropriate reviewer). |
|
||
// This algorithm requires AVX512VBMI support. | ||
// Vbmi was first introduced in CannonLake and is avaialable from IceLake on. | ||
// This makes it okay to use Vbmi instructions since Vector512.IsHardwareAccelerated returns True only from IceLake on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't quite accurate.
Vector512.IsHardwareAccelerated
can be made to return true
for Skylake-X
and up to before IceLake
via an environment variable. This is why the caller has the check for Vector512.IsHardwareAccelerated && Avx512Vbmi.IsSupported
and why this function has [CompExactlyDependsOn(typeof(Avx512Vbmi))]
We're fine with it not being usable pre IceLake since those often incur heavier downclocking and its unnecessary complexity for a non-default scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed line 667
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs
Show resolved
Hide resolved
str = Avx512Vbmi.PermuteVar64x8(multiAdd2.AsByte(), vbmiPackedLanesControl).AsSByte(); | ||
|
||
AssertWrite<Vector512<sbyte>>(dest, destStart, destLength); | ||
Vector512.Store(str.AsByte(), dest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Most other places in the JIT we do str.Store(dest)
since its an extension method and can be accessed using instance syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely conflicting because dest
is a byte*
while str
is a Vector512<sbyte>
and so it can't resolve
str.Store((sbyte*)dest)
should fix it, or str.AsByte().Store(dest)
. The former is less IL, most notably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah..I messed up and was using str.AsSbyte().Store(dest)
It's fixed now. Thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a request to cleanup a couple minor things.
Signed-off-by: Deepak Rajendrakumaran <deepak.rajendrakumaran@intel.com>
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Decoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Encoder.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Buffers/Text/Base64Encoder.cs
Show resolved
Hide resolved
4b2f7d1
to
fc05beb
Compare
I've committed the clean-up changes. Please let me know if you want me to make any other changes |
* Adding AVX512 path to Base64 encoding/Decoding * Addressing review Comments. Signed-off-by: Deepak Rajendrakumaran <deepak.rajendrakumaran@intel.com> * Removing fallback path. * Updating Third Party Notice. * Addressing review comments --------- Signed-off-by: Deepak Rajendrakumaran <deepak.rajendrakumaran@intel.com>
Overview
This PR implements an AVX512 code path for Base64 encoding/Decoding. This is based on the work by Wojciech Muła and Daniel Lemire. There is a fast
AVX512VBMI
path and the fallback usesAVX512F/BW
- I'll refer to these as VBMI_AVAILABLE and VBMI_UNAVAILABLE here on. For performance purposes, this will be compared to an AVX2 implementation which will be referred to as BASE_VersionReference for the algorithm:
This version uses intrinsics directly and not generic vector libraries due to lack of current support in JIt/Vector libraries to produce optimal code. Some additional support which would be required in order to use generic vector library for implementing this would be
ShuffleUnsafe
for Vector512Vector512.Shuffle()
to lower to intrinsics instead of going to fallback for more cases.Vector512
surface area to incorporate more high level functionsEven the current implementation can be further optimized by adding the
multishift()
implementation. This is a further optimizationGenerated code
(Will be focusing on the actual encoding/decoding code within the loop only)
Encoding
VBMI_AVAILABLE
VBMI_UNAVAILABLE
BASE_VERSION
Decoding
VBMI_AVAILABLE
VBMI_UNAVAILABLE
BASE_VERSION
Performance
ON ICX -
BASE_VERSION vs VBMI_UNAVAILABLE
BASE_VERSION vs VBMI_AVAILABLE