-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance optimization for Align post processing block #2040
Conversation
85bd4a4
to
4fe7f05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good
src/proc/align.cpp
Outdated
@@ -310,13 +683,33 @@ namespace librealsense | |||
|
|||
byte* other_aligned_to_depth = const_cast<byte*>(aligned_frame.frame->get_frame_data()); | |||
memset(other_aligned_to_depth, 0, depth_intrinsics.height * depth_intrinsics.width * aligned_bytes_per_pixel); | |||
#ifdef __SSSE3__ | |||
auto uid = other_frame->get_stream()->get_unique_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used ?
src/proc/align.cpp
Outdated
auto data = (int16_t*)depth_frame->get_frame_data(); | ||
|
||
#ifdef __SSSE3__ | ||
auto uid = other_frame->get_stream()->get_unique_id(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here
@@ -363,4 +773,5 @@ namespace librealsense | |||
auto callback = new internal_frame_processor_callback<decltype(cb)>(cb); | |||
processing_block::set_processing_callback(std::shared_ptr<rs2_frame_processor_callback>(callback)); | |||
} | |||
} | |||
|
|||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation + add empty line at the end of file
src/proc/align.cpp
Outdated
{ | ||
} | ||
|
||
void image_transform::pre_compute_x_y_map(float offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments what the pre-compute comprise of
bottom_right = _pixel_bottom_right_int; | ||
} | ||
|
||
switch (bpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch case seem to be redundant
src/proc/align.cpp
Outdated
template<> | ||
inline void image_transform::distorte_x_y<RS2_DISTORTION_MODIFIED_BROWN_CONRADY>(const __m128& x, const __m128& y, __m128* distorted_x, __m128* distorted_y, const rs2_intrinsics& to) | ||
{ | ||
__m128 c[5]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments for the hard-coded values - [5] == distortion coefficients,
for maintainability
const rs2_intrinsics& to, | ||
const rs2_extrinsics& from_to_other) | ||
{ | ||
//mask for shuffle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add pseudo-code to describe the flow for maintainability
This pull-request improves latency and CPU utilization of
align
processing block.The solution combines SSE4 optimization for Intel architecture with changes to the algorithm to reduce overhead.
Following-up on: #1189, #1105
OS Microsoft Windows 10 Enterprise
Processor Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz, 2301 Mhz, 2 Core(s), 4 Logical Processor(s)
Align depth to color
Align color to depth
Ubuntu 16.04
Intel® Core™ i7-6700 CPU @ 3.40GHz × 8
Align depth to color
Align color to depth