-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedicated ranges::minmax
vectorization that does not unnecessarily track element pointer
#4384
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: A. Jiang <de34@live.cn>
StephanTLavavej
requested changes
Feb 11, 2024
This is a bug fix, not only a change to address a pedantic comment!
…_Val1` and `_Val2`. Regex renamed: `\[\]\((__m128[id]?) _First, \1 _Second\) \{ return (\w+)\(_First, _Second\); \}` to: `[]($1 _Val1, $1 _Val2) { return $2(_Val1, _Val2); }`
Remove comments from `_mm_min_epi16` and `_mm_max_epi16` - they're SSE2, not SSE4.1.
StephanTLavavej
approved these changes
Feb 12, 2024
Thanks! I resolved the remaining feedback comments, and decided to simply require SSE 4.2 for the new optimization, consistent with the previous optimization. This is a lot easier to reason about.
|
I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
CaseyCarter
approved these changes
Feb 16, 2024
Thanks for maximizing performance! 😹 🚀 😻 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Towards #2803
New header
Can't have
extern "C"
functions that use templates, so the new header accommodates non-templateminmax
structure variations.Sign handing
Unlike
_element
counterpart, the integer signedness affects function selection rather than passed as runtime parameter.For
minmax_element
, computing element pointer requires index, socmpgt
+blendv
pair is needed to capture it.cmpgt
is only available in signed form, so unsigned types are supported by adjustment of the input values.For
minmax
values, computing only vertical min/max without the index for most of types doesn't need thecmpgt
+blendv
pair, as there are signed and unsignedmin
/max
operation. The only exception is 64-bit integer support: there are nomin
/max
ops, so still usingblendv
. For consistency, enforced also by the template use, even for 64-bit integers the sign is still passed as function selectionBenchmark
_val
are new results, we indirect for value thereI put output of the benchmark before and after in a single table
Some rows show order of magnitude improvement.
Some show difference within results variation.
I interpret these results as the change being pure improvement, and the row that don't show it are memory-bound rather than CPU bound, and this can change on a different machine or by reducing input data amount.