Performance optimization for Align post processing block #2040

aangerma · 2018-07-11T16:43:40Z

This pull-request improves latency and CPU utilization of align processing block.
The solution combines SSE4 optimization for Intel architecture with changes to the algorithm to reduce overhead.

Following-up on: #1189, #1105

OS Microsoft Windows 10 Enterprise

Processor Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz, 2301 Mhz, 2 Core(s), 4 Logical Processor(s)

Align depth to color

		Depth 640x480	Depth 1280x720
Color 640x480	NAIVE (microseconds)	11217.4	26023.6
	SSE (microseconds)	4346.96	4360.56
Speedup		x 2.58	x 5.96
Color 1280x720	NAIVE (microseconds)	14868.2	44311.9
	SSE (microseconds)	10497.1	8327.42
Speedup		x 1.41	x 5.32

Align color to depth

		Color 640x480	Color 1280x720
Depth 640x480	NAIVE (microseconds)	14904.4	14642.8
	SSE (microseconds)	3454.34	3863.21
Speedup		x 4.31	x 3.79
Depth 1280x720	NAIVE (microseconds)	24630.5	22478.8
	SSE (microseconds)	3218.72	3292.07
Speedup		x 7.65	x 6.82

Ubuntu 16.04

Intel® Core™ i7-6700 CPU @ 3.40GHz × 8

Align depth to color

		Depth 640x480	Depth 1280x720
Color 640x480	NAIVE (microseconds)	13006.9	19729.7
	SSE (microseconds)	1941.03	1120.02
Speedup		x 6.70	x 17.61
Color 1280x720	NAIVE (microseconds)	8325.66	21869.3
	SSE (microseconds)	4256.76	3838.1
Speedup		x 1.95	x 5.69

Align color to depth

		Color 640x480	Color 1280x720
Depth 640x480	NAIVE (microseconds)	14049.8	10048.8
	SSE (microseconds)	1475.04	1518.32
Speedup		x 9.52	x 6.61
Depth 1280x720	NAIVE (microseconds)	22304.3	25391.8
	SSE (microseconds)	1430.76	1526.98
Speedup		x 15.58	x 16.62

ev-mp

Looking good

ev-mp · 2018-07-16T08:34:15Z

src/proc/align.cpp

@@ -310,13 +683,33 @@ namespace librealsense

                byte* other_aligned_to_depth = const_cast<byte*>(aligned_frame.frame->get_frame_data());
                memset(other_aligned_to_depth, 0, depth_intrinsics.height * depth_intrinsics.width * aligned_bytes_per_pixel);
+#ifdef __SSSE3__
+                auto uid = other_frame->get_stream()->get_unique_id();


Is it used ?

ev-mp · 2018-07-16T08:34:45Z

src/proc/align.cpp

+                auto data = (int16_t*)depth_frame->get_frame_data();
+
+#ifdef __SSSE3__
+                auto uid = other_frame->get_stream()->get_unique_id();


ev-mp · 2018-07-16T08:35:45Z

src/proc/align.cpp

@@ -363,4 +773,5 @@ namespace librealsense
        auto callback = new internal_frame_processor_callback<decltype(cb)>(cb);
        processing_block::set_processing_callback(std::shared_ptr<rs2_frame_processor_callback>(callback));
    }
-}
+
+        }


Indentation + add empty line at the end of file

ev-mp · 2018-07-16T08:51:18Z

src/proc/align.cpp

+    {
+    }
+
+    void image_transform::pre_compute_x_y_map(float offset)


Please add comments what the pre-compute comprise of

ev-mp · 2018-07-16T09:00:54Z

src/proc/align.cpp

+            bottom_right = _pixel_bottom_right_int;
+        }
+
+        switch (bpp)


The switch case seem to be redundant

ev-mp · 2018-07-16T09:02:55Z

src/proc/align.cpp

+    template<>
+    inline void image_transform::distorte_x_y<RS2_DISTORTION_MODIFIED_BROWN_CONRADY>(const __m128& x, const __m128& y, __m128* distorted_x, __m128* distorted_y, const rs2_intrinsics& to)
+    {
+        __m128 c[5];


Please add comments for the hard-coded values - [5] == distortion coefficients,
for maintainability

ev-mp · 2018-07-16T09:03:52Z

src/proc/align.cpp

+        const rs2_intrinsics& to,
+        const rs2_extrinsics& from_to_other)
+    {
+        //mask for shuffle


Please add pseudo-code to describe the flow for maintainability

aangerma force-pushed the development branch 3 times, most recently from 85bd4a4 to 4fe7f05 Compare July 12, 2018 14:29

dorodnic closed this Jul 12, 2018

dorodnic reopened this Jul 12, 2018

dorodnic closed this Jul 15, 2018

dorodnic reopened this Jul 15, 2018

aangerma force-pushed the development branch from 4fe7f05 to a9ecc1b Compare July 16, 2018 06:42

aangerma added 2 commits July 16, 2018 10:19

Align improve performance.

029b0d9

Fixes compilation error on android.

59d6af9

ev-mp approved these changes Jul 16, 2018

View reviewed changes

aangerma force-pushed the development branch from a9ecc1b to 30a76ea Compare July 16, 2018 08:47

ev-mp reviewed Jul 16, 2018

View reviewed changes

src/proc/align.cpp

bottom_right = _pixel_bottom_right_int;

}

switch (bpp)

Copy link

Collaborator

ev-mp Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch case seem to be redundant

ev-mp reviewed Jul 16, 2018

View reviewed changes

Fixes following code review.

7abb777

aangerma force-pushed the development branch from 30a76ea to 7abb777 Compare July 16, 2018 09:50

dorodnic merged commit 05d95f8 into IntelRealSense:development Jul 16, 2018

jpapon mentioned this pull request Jul 19, 2018

High CPU Utilization #1105

Closed

dorodnic mentioned this pull request Aug 29, 2018

Low fps when aligning depth to colored frame #2321

Closed

dorodnic mentioned this pull request Dec 17, 2018

Grid artifact related to SSE instructions #2909

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimization for Align post processing block #2040

Performance optimization for Align post processing block #2040

aangerma commented Jul 11, 2018 •

edited by dorodnic

Loading

ev-mp left a comment

ev-mp Jul 16, 2018

ev-mp Jul 16, 2018

ev-mp Jul 16, 2018

ev-mp Jul 16, 2018

ev-mp Jul 16, 2018

ev-mp Jul 16, 2018

ev-mp Jul 16, 2018

Performance optimization for Align post processing block #2040

Performance optimization for Align post processing block #2040

Conversation

aangerma commented Jul 11, 2018 • edited by dorodnic Loading

OS Microsoft Windows 10 Enterprise

Align depth to color

Align color to depth

Ubuntu 16.04

Align depth to color

Align color to depth

ev-mp left a comment

Choose a reason for hiding this comment

ev-mp Jul 16, 2018

Choose a reason for hiding this comment

ev-mp Jul 16, 2018

Choose a reason for hiding this comment

ev-mp Jul 16, 2018

Choose a reason for hiding this comment

ev-mp Jul 16, 2018

Choose a reason for hiding this comment

ev-mp Jul 16, 2018

Choose a reason for hiding this comment

ev-mp Jul 16, 2018

Choose a reason for hiding this comment

ev-mp Jul 16, 2018

Choose a reason for hiding this comment

aangerma commented Jul 11, 2018 •

edited by dorodnic

Loading