Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimization for Align post processing block #2040

Merged
merged 3 commits into from
Jul 16, 2018

Conversation

aangerma
Copy link
Contributor

@aangerma aangerma commented Jul 11, 2018

This pull-request improves latency and CPU utilization of align processing block.
The solution combines SSE4 optimization for Intel architecture with changes to the algorithm to reduce overhead.

Following-up on: #1189, #1105

OS Microsoft Windows 10 Enterprise

Processor Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz, 2301 Mhz, 2 Core(s), 4 Logical Processor(s)

Align depth to color

    Depth 640x480 Depth 1280x720
Color 640x480 NAIVE  (microseconds) 11217.4 26023.6
SSE    (microseconds) 4346.96 4360.56
Speedup   x 2.58 x 5.96
Color 1280x720 NAIVE  (microseconds) 14868.2 44311.9
SSE     (microseconds) 10497.1 8327.42
Speedup   x 1.41 x 5.32

Align color to depth

    Color 640x480 Color 1280x720
Depth 640x480 NAIVE    (microseconds) 14904.4 14642.8
SSE       (microseconds) 3454.34 3863.21
Speedup   x 4.31 x 3.79
Depth 1280x720 NAIVE  (microseconds) 24630.5 22478.8
SSE     (microseconds) 3218.72 3292.07
Speedup   x 7.65 x 6.82

Ubuntu 16.04

Intel® Core™ i7-6700 CPU @ 3.40GHz × 8

Align depth to color

    Depth 640x480 Depth 1280x720
Color 640x480 NAIVE (microseconds) 13006.9 19729.7
SSE   (microseconds) 1941.03 1120.02
Speedup   x 6.70 x 17.61
Color 1280x720 NAIVE (microseconds) 8325.66 21869.3
SSE    (microseconds) 4256.76 3838.1
Speedup   x 1.95 x 5.69

Align color to depth

    Color 640x480 Color 1280x720
Depth 640x480 NAIVE (microseconds) 14049.8 10048.8
SSE    (microseconds) 1475.04 1518.32
Speedup   x 9.52 x 6.61
Depth 1280x720 NAIVE (microseconds) 22304.3 25391.8
SSE    (microseconds) 1430.76 1526.98
Speedup   x 15.58 x 16.62

Copy link
Collaborator

@ev-mp ev-mp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

@@ -310,13 +683,33 @@ namespace librealsense

byte* other_aligned_to_depth = const_cast<byte*>(aligned_frame.frame->get_frame_data());
memset(other_aligned_to_depth, 0, depth_intrinsics.height * depth_intrinsics.width * aligned_bytes_per_pixel);
#ifdef __SSSE3__
auto uid = other_frame->get_stream()->get_unique_id();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used ?

auto data = (int16_t*)depth_frame->get_frame_data();

#ifdef __SSSE3__
auto uid = other_frame->get_stream()->get_unique_id();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here

@@ -363,4 +773,5 @@ namespace librealsense
auto callback = new internal_frame_processor_callback<decltype(cb)>(cb);
processing_block::set_processing_callback(std::shared_ptr<rs2_frame_processor_callback>(callback));
}
}

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation + add empty line at the end of file

{
}

void image_transform::pre_compute_x_y_map(float offset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments what the pre-compute comprise of

bottom_right = _pixel_bottom_right_int;
}

switch (bpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch case seem to be redundant

template<>
inline void image_transform::distorte_x_y<RS2_DISTORTION_MODIFIED_BROWN_CONRADY>(const __m128& x, const __m128& y, __m128* distorted_x, __m128* distorted_y, const rs2_intrinsics& to)
{
__m128 c[5];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments for the hard-coded values - [5] == distortion coefficients,
for maintainability

const rs2_intrinsics& to,
const rs2_extrinsics& from_to_other)
{
//mask for shuffle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add pseudo-code to describe the flow for maintainability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants