Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AddImpl and SubtractImpl with Avx512 #41

Merged
merged 1 commit into from
Dec 26, 2024
Merged

Improve AddImpl and SubtractImpl with Avx512 #41

merged 1 commit into from
Dec 26, 2024

Conversation

benaadams
Copy link
Member

Removes three instructions from the filling out of the instruction set with Avx512

; Method Nethermind.Int256.UInt256:AddImpl(byref,byref,byref):ubyte (FullOpts)
G_M000_IG01:                ;; offset=0x0000

G_M000_IG02:                ;; offset=0x0000
       vmovups  ymm0, ymmword ptr [rcx]
       vpaddq   ymm1, ymm0, ymmword ptr [rdx]
-      vpternlogq ymm3, ymm3, ymm2, 85
-      vmovaps  ymm4, ymm0
-      vpternlogq ymm4, ymm3, ymm1, -56
-      vpternlogq ymm0, ymm4, ymm1, -20
+      vpcmpuq  k1, ymm1, ymm0, 1
+      vpmovm2q ymm0, k1
       vmovmskpd rax, ymm0
       vpcmpeqd ymm0, ymm0, ymm0
       vpcmpeqq ymm0, ymm0, ymm2
       vmovmskpd rcx, ymm0

       vpcmpuq  k1, ymm1, ymm0, 1
       vpmovm2q ymm0, k1
       vmovmskpd rax, ymm0
       vpcmpeqd ymm0, ymm0, ymm0
       vpcmpeqq ymm0, ymm0, ymm1
       vmovmskpd rcx, ymm0

       lea      eax, [rcx+2*rax]
       xor      ecx, eax
       and      ecx, 15
       movsxd   rcx, ecx
       shl      rcx, 5
       mov      rdx, 0x2272EAEC928
       vmovups  ymm0, ymmword ptr [rcx+rdx]
       vpaddq   ymm0, ymm2, ymm0
       vmovups  ymmword ptr [r8], ymm0
       test     al, 16
       setne    al
       movzx    rax, al

G_M000_IG03:                ;; offset=0x0065
       vzeroupper 
       ret      
; Total bytes of code: 105

Copy link
Contributor

@rubo rubo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested on supported hardware?

@benaadams
Copy link
Member Author

Have you tested on supported hardware?

Yes, is supported by AMD Ryzen Zen 4+

@benaadams benaadams merged commit 7a21ba3 into master Dec 26, 2024
4 checks passed
@benaadams benaadams deleted the avx512 branch December 26, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants