-
-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SIMD version of grayscale #2214
Conversation
mm_zero = _mm_setzero_si128(); | ||
mm_alpha_mask = _mm_cvtsi32_si128(amask); | ||
mm_rgb_mask = _mm_cvtsi32_si128(rgbmask); | ||
mm_two_five_fives = _mm_set_epi64x(0x00FF00FF00FF00FF, 0x00FF00FF00FF00FF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be _mm_set1_epi64x
while (pixel_batch_counter--) { | ||
mm_src = _mm_cvtsi32_si128(*srcp); | ||
/*mm_src = 0x000000000000000000000000AARRGGBB*/ | ||
mm_alpha = _mm_subs_epu8(mm_src, mm_rgb_mask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never been super clear about what saturation means in the intrinsics, so this is a very interesting example.
https://en.wikipedia.org/wiki/Saturation_arithmetic tells me it is arithmetic clamped to a range.
So rgb could be anything, but you subtract 255 with saturation--> meaning it will always end as zero. Very clever stuff. I wonder if there's masking in other SIMD stuff that could be converted to this for a small efficiency gain.
|
||
mm_dst = _mm_adds_epu8( | ||
_mm_adds_epu8( | ||
_mm_shufflelo_epi16(mm_dst, _MM_SHUFFLE(0, 0, 0, 0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should provide our own version of _MM_SHUFFLE
so we're not dependent on this implementation detail of sse2neon.
It took me a bit to parse whether this was something by us, by the intrinsics header, or by sse2neon.
Closing this as splitting it up into multiple PRs did not seem to work very well. |
Splitting #2042 into two parts. This will just be the grayscale implementation excluding the setup #2042 should be resolved first.