Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARM NEON intrinsics to unpack_yuy2 #13270

Merged
merged 3 commits into from
Sep 17, 2024

Conversation

fateshelled
Copy link

@fateshelled fateshelled commented Aug 16, 2024

Changes:

  • adding unpack_yuy2_neon function
    • unpack yuy2 to y8, y16, rgb8, rgba8, bgr8 and bgra8 format.
    • tested on Ubuntu 22.04, OrangePi5 (RK3588s, 8GB of RAM) and RealSense D435.

@sysrsbuild
Copy link
Collaborator

Can one of the admins verify this patch?

@fateshelled fateshelled changed the title Add NEON intrinsics to unpack_yuy2 Add ARM NEON intrinsics to unpack_yuy2 Aug 19, 2024
@fateshelled
Copy link
Author

I explained too little, my apologies.
I added the optimization code for ARM CPUs because the CPU load was too large when using ARM CPUs.
Please review if you like.

@Nir-Az
Copy link
Collaborator

Nir-Az commented Aug 28, 2024

Hi @fateshelled ,
Thanks for the PR, it will take some time but we will get to reviewing it.
Maybe you can undo format changes to make the PR more readable?
I see rs.cpp have format changes added..

@fateshelled
Copy link
Author

Hi @Nir-Az ,
Thanks for the reply.
I fixed format changes. Please review it.

@Nir-Az
Copy link
Collaborator

Nir-Az commented Sep 3, 2024

It will take us some time to review and validate this PR on dedicated HW.
I would suggest to split bug fixes and new features as we may merge 1 faster than the other :)

@fateshelled
Copy link
Author

Thanks for the reply.
I have changed the request to a pull request for new features only.

{
// Load 16 pixels
const uint8x8x4_t yuyv = vld4_u8(reinterpret_cast<const uint8_t *>(&src[i * 2]));
// yuyv.val[0] = y0, yuyv.val[1] = u, yuyv.val[2] = y1, yuyv.val[3] = v
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't necessary, so I fixed it.


void unpack_yuy2_neon_y8(uint8_t * const d[], const uint8_t * s, int n)
{
unpack_yuy2_neon<RS2_FORMAT_Y8>(d, s, n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not calling the templated function at the first place instead of adding functions that call it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the code image-avx.cpp as a reference, but is this wrong?

unpack_yuy2<RS2_FORMAT_Y8>(d, s, n);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not wrong.. just redundant.|
But since our code is a reference let's keep as is

@Nir-Az Nir-Az merged commit 5a8da0a into IntelRealSense:development Sep 17, 2024
17 of 19 checks passed
@Nir-Az
Copy link
Collaborator

Nir-Az commented Sep 17, 2024

@fateshelled appreciate your contribution, we are always happy to integrate community pull requests :)
Since this feature looks safe for regression and would benefit you and other users, I merged it based on your testing.

Thank you :)

@fateshelled
Copy link
Author

Thank you for merging the PR.
I am very happy to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants