Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SIMD version of grayscale #2214

Closed
wants to merge 27 commits into from
Closed

Add SIMD version of grayscale #2214

wants to merge 27 commits into from

Conversation

MyreMylar
Copy link
Member

@MyreMylar MyreMylar commented May 31, 2023

Splitting #2042 into two parts. This will just be the grayscale implementation excluding the setup #2042 should be resolved first.

@MyreMylar MyreMylar requested a review from a team as a code owner May 31, 2023 11:00
@yunline yunline added the Performance Related to the speed or resource usage of the project label Jun 4, 2023
@yunline yunline added transform pygame.transform SIMD labels Jun 4, 2023
mm_zero = _mm_setzero_si128();
mm_alpha_mask = _mm_cvtsi32_si128(amask);
mm_rgb_mask = _mm_cvtsi32_si128(rgbmask);
mm_two_five_fives = _mm_set_epi64x(0x00FF00FF00FF00FF, 0x00FF00FF00FF00FF);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be _mm_set1_epi64x

while (pixel_batch_counter--) {
mm_src = _mm_cvtsi32_si128(*srcp);
/*mm_src = 0x000000000000000000000000AARRGGBB*/
mm_alpha = _mm_subs_epu8(mm_src, mm_rgb_mask);
Copy link
Member

@Starbuck5 Starbuck5 Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never been super clear about what saturation means in the intrinsics, so this is a very interesting example.

https://en.wikipedia.org/wiki/Saturation_arithmetic tells me it is arithmetic clamped to a range.

So rgb could be anything, but you subtract 255 with saturation--> meaning it will always end as zero. Very clever stuff. I wonder if there's masking in other SIMD stuff that could be converted to this for a small efficiency gain.


mm_dst = _mm_adds_epu8(
_mm_adds_epu8(
_mm_shufflelo_epi16(mm_dst, _MM_SHUFFLE(0, 0, 0, 0)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should provide our own version of _MM_SHUFFLE so we're not dependent on this implementation detail of sse2neon.

It took me a bit to parse whether this was something by us, by the intrinsics header, or by sse2neon.

@MyreMylar
Copy link
Member Author

Closing this as splitting it up into multiple PRs did not seem to work very well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project SIMD transform pygame.transform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants