Peakflops for zen3/4/5 style architectures by issuing 2xFMA and 2xADD simultaneously. #659
+118
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
Is this of interest to you? These two benchmarks are intended to achieve peakflop rate on architectures like zen3/4/5 by issuing FMA and ADD instructions at the same time. Example outputs from a zen4 7970X.
Existing
peakflops_avx_fma
Proposed
peakflops_avx_fma_add
Proposed
peakflops_avx512_fma_add
(Expected to be roughly the same performance as the avx2 version on zen4).