Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rviz_default_plugins runs out of memory on CI packaging jobs #539

Closed
sloretz opened this issue May 5, 2020 · 4 comments · Fixed by #540
Closed

rviz_default_plugins runs out of memory on CI packaging jobs #539

sloretz opened this issue May 5, 2020 · 4 comments · Fixed by #540

Comments

@sloretz
Copy link
Contributor

sloretz commented May 5, 2020

Copying from PR #532 which did not fix issue:


Opening issue because I need to ask for help. The ros2 linux packaging job is failing to build rviz_defaullt_plugins with an error that looks a bit like the agent ran out of memory.

07:37:10 [ 22%] Building CXX object CMakeFiles/rviz_default_plugins.dir/src/rviz_default_plugins/displays/map/swatch.cpp.o
07:37:10 [ 23%] Building CXX object CMakeFiles/rviz_default_plugins.dir/src/rviz_default_plugins/displays/marker/markers/arrow_marker.cpp.o
07:37:10 [ 24%] Building CXX object CMakeFiles/rviz_default_plugins.dir/src/rviz_default_plugins/displays/marker/markers/line_list_marker.cpp.o
07:37:10 c++: fatal error: Killed signal terminated program cc1plus
07:37:10 compilation terminated.
07:37:10 make[2]: *** [CMakeFiles/rviz_default_plugins.dir/build.make:63: CMakeFiles/rviz_default_plugins.dir/rviz_default_plugins_autogen/mocs_compilation.cpp.o] Error 1
07:37:10 make[2]: *** Waiting for unfinished jobs....
07:37:10 make[1]: *** [CMakeFiles/Makefile2:79: CMakeFiles/rviz_default_plugins.dir/all] Error 2
07:37:10 make: *** [Makefile:141: all] Error 2

This PR tried compiling separate object libraries and then linking them into the big rviz_default_plugins library would reduce the peak memory usage. It didn't.

Here's a script I wrote to watch the CPU and memory usage. It outputs csv to stdout that can be graphed.

import re
import sys
import time

import psutil


def get_cumulative_usage(regex):
    sum_cpu = 0.0
    sum_mem = 0.0
    for proc in psutil.process_iter(['name', 'cpu_percent', 'memory_percent']):
        try:
            if re.match(regex, proc.info['name']):
                sum_cpu += proc.info['cpu_percent']
                sum_mem += proc.info['memory_percent']
        except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
            pass
    return sum_cpu, sum_mem


if __name__ == '__main__':
    if 2 != len(sys.argv):
        sys.stderr.write(f'Usage: {sys.argv[0]} <regular_expression>\n')
        sys.exit(1)

    regex = re.compile(sys.argv[1])
    start = time.monotonic()
    now = start

    prev_cpu = None
    prev_mem = None

    while True:
        cum_mem, cum_cpu = get_cumulative_usage(regex)
        if cum_mem != prev_mem or cum_cpu != prev_cpu:
            print(f'{now - start}, {cum_mem}, {cum_cpu}')

        prev_mem = cum_mem
        prev_cpu = cum_cpu
        time.sleep(0.1)
        now = time.monotonic()

I used it to watch the cumulative CPU and memory usage of all ld or cc1plus processes on my machine while doing a clean build rviz_default_plugins.

python3 proc_usage.py "ld|cc1plus" > ~/rviz_mem_usage.csv

Here's the memory usage of building just rviz_default_plugins on the ros2 branch. (vertical axis % of system memory used, horizontal axis seconds)
Screenshot from 2020-04-28 13-57-04

Here's the memory usage using this branch
Screenshot from 2020-04-28 13-57-12

It does not appear to have reduced the peak. The only other idea I have is to break up the rviz_default_plugins library into smaller shared libraries.

@clalancette
Copy link
Contributor

I'm taking a look in between other meetings. One thing I've found so far is that the mocs_compilation.cpp that is generated by Qt takes ~2.5GB to compile. Most of the other files (at least so far) are much smaller, between 500-1000MB. It could just be death by a thousand cuts, and when we spin up enough simultaneous compilers, we run out of memory. But I'm still collecting data, I'll summarize it more after I get it all.

@clalancette
Copy link
Contributor

clalancette commented May 5, 2020

All right, I collected some data on what's going on here. Essentially what I did was to make a bash script with the following contents:

#!/bin/bash

/usr/bin/time -v g++ $@

I then pointed the colcon at that "cc" while building just the rviz_default_plugins package:

MAKEFLAGS="VERBOSE=1 -j1" colcon build --event-handlers console_direct+ --packages-select rviz_default_plugins --cmake-args -DCMAKE_CXX_COMPILER="/home/ubuntu/time-cc"

That gave me the output from the time command for each file as it completed. Among the data printed is Maximum RSS size; here's one example:

[9.973s]        Command being timed: "g++ -D_BUILD_DIR_PATH="/home/ubuntu/ros2_ws/build/rviz_default_plugins" -D_SRC_DIR_PATH="/home/ubuntu/ros2_ws/src/ros2/rviz/rviz_default_plugins" -I/home/ubuntu/ros2_ws/build/rviz_default_plugins/gmock -I/home/ubuntu/ros2_ws/install/gmock_vendor/src/gmock_vendor -I/home/ubuntu/ros2_ws/build/rviz_default_plugins/gmock/gmock_autogen/include -I/home/ubuntu/ros2_ws/install/gmock_vendor/src/gmock_vendor/include -I/home/ubuntu/ros2_ws/install/gmock_vendor/src/gmock_vendor/. -I/home/ubuntu/ros2_ws/install/gtest_vendor/src/gtest_vendor/include -I/home/ubuntu/ros2_ws/install/gtest_vendor/src/gtest_vendor -Wno-missing-field-initializers -Wall -Wextra -Wpedantic -std=gnu++14 -o CMakeFiles/gmock.dir/gmock_autogen/mocs_compilation.cpp.o -c /home/ubuntu/ros2_ws/build/rviz_default_plugins/gmock/gmock_autogen/mocs_compilation.cpp"
[9.973s]        User time (seconds): 0.00
[9.974s]        System time (seconds): 0.01
[9.974s]        Percent of CPU this job got: 95%
[9.974s]        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.02
[9.974s]        Average shared text size (kbytes): 0
[9.975s]        Average unshared data size (kbytes): 0
[9.975s]        Average stack size (kbytes): 0
[9.975s]        Average total size (kbytes): 0
[9.975s]        Maximum resident set size (kbytes): 18256
[9.975s]        Average resident set size (kbytes): 0
[9.975s]        Major (requiring I/O) page faults: 0
[9.975s]        Minor (reclaiming a frame) page faults: 2534
[9.976s]        Voluntary context switches: 7
[9.976s]        Involuntary context switches: 1
[9.976s]        Swaps: 0
[9.976s]        File system inputs: 0
[9.976s]        File system outputs: 16
[9.976s]        Socket messages sent: 0
[9.976s]        Socket messages received: 0
[9.976s]        Signals delivered: 0
[9.976s]        Page size (bytes): 4096
[9.976s]        Exit status: 0

I then wrote a python script to comb through that data, and sort by the maximum amount of memory being used by a process. I've elided the script here as it is a bit too long, but I can push it somewhere if people are interested.

The data that I got out of it looks like this (I'm only showing the top 30; there are a lot more):

rviz_default_plugins/rviz_default_plugins_autogen/mocs_compilation.cpp: 2586380 kbytes
displays/pointcloud/point_cloud_scalar_display_test.cpp: 935192 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913220 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913216 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913212 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913204 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913196 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913184 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913172 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913148 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913120 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913116 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913116 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913112 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913104 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 913072 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 912996 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 912928 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 912908 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 912892 kbytes
rviz_default_plugins/displays/display_test_fixture.cpp: 912880 kbytes
rviz_default_plugins/robot/robot_test.cpp: 903188 kbytes
displays/camera/camera_display_visual_test.cpp: 863060 kbytes
rviz_default_plugins/transformation/transformer_guard_test.cpp: 858844 kbytes
displays/map/map_display_visual_test.cpp: 841916 kbytes
displays/marker/marker_display_visual_test.cpp: 838276 kbytes
displays/marker_array/marker_array_display_visual_test.cpp: 837208 kbytes
displays/laser_scan/laser_scan_display.cpp: 828172 kbytes
displays/pointcloud/point_cloud2_display_visual_test.cpp: 817312 kbytes
displays/pointcloud/point_cloud_display_visual_test.cpp: 816316 kbytes

A few things pop out at me:

  1. The mocs_compilation takes 2.5GB. This is the worst offender, so if we were going to look at splitting something up, I would go after that. That being said, it is not that bad.
  2. We are compiling the display_test_fixture.cpp file close to 20 times (I think it is once per test).
  3. Other than the mocs_compilation, there doesn't seem to be one single process that is using too much of the memory. I'm guessing we just spin up more processes than can be handled at once, and it causes this out-of-memory scenario.

A few quick ideas on how to improve things:

  1. Don't compile display_test_fixture.cpp as many times. Maybe we can just compile that once and use it as a library?
  2. Try to split the mocs_compilation stuff up somehow.
  3. Compile rviz_default_plugins separately from the rest of the packages.
  4. Somehow force rviz_default_plugins to only compile with e.g. -j 2 to improve memory consumption.
  5. Switch to an AWS m* worker so we get more memory.

I'll start looking at some of these options tomorrow.

@allenh1
Copy link
Contributor

allenh1 commented May 5, 2020

@clalancette / @sloretz I have been pretty annoyed building this package on my desktop, actually. It takes way longer than just about anything else on my system.

The behavior I've noticed is rather different, though. I've got a 32 core threadripper (which gives 64 threads for compilation). This package takes longer than anything else in the stack, and seems to hang for quite a while, using only 1 or 2 threads. I have no idea why that is, but was planning to look into it so I'll mention it here while we're on the subject (this single package takes me about twice as long as all the other ROS 2 packages -- pretty odd).

Don't compile display_test_fixture.cpp as many times. Maybe we can just compile that once and use it as a library?

Yes, that should help. That probably doesn't relate to my strange lack of cpu-usage, though.

@clalancette
Copy link
Contributor

All right, I've identified 3 different concrete things that will help here:

  1. Compile display_test_fixture.cpp just once, then reuse it as a library.
  2. Disable automoc stuff on tests.
  3. Split up mocs_compilation.cpp into one file per moc.

With the combination of all of those, the build is about 10% slower, but the maximum memory usage looks more like:

Compile displays/pointcloud/point_cloud_scalar_display_test.cpp: 934924 kbytes
Compile rviz_default_plugins/displays/display_test_fixture.cpp: 912784 kbytes
Compile rviz_default_plugins/robot/robot_test.cpp: 903168 kbytes
Compile displays/camera/camera_display_visual_test.cpp: 862760 kbytes
Compile rviz_default_plugins/transformation/transformer_guard_test.cpp: 858644 kbytes
Compile displays/map/map_display_visual_test.cpp: 841588 kbytes
Compile displays/marker/marker_display_visual_test.cpp: 838208 kbytes
Compile displays/marker_array/marker_array_display_visual_test.cpp: 836944 kbytes
Compile displays/laser_scan/laser_scan_display.cpp: 828848 kbytes
Compile displays/pointcloud/point_cloud2_display_visual_test.cpp: 817132 kbytes
Compile displays/pointcloud/point_cloud_display_visual_test.cpp: 816300 kbytes
Compile displays/odometry/odometry_display_visual_test.cpp: 814980 kbytes
Compile displays/grid_cells/grid_cells_display_visual_test.cpp: 814660 kbytes
Compile displays/pose_array/pose_array_display_visual_test.cpp: 813756 kbytes
Compile displays/path/path_display_visual_test.cpp: 813736 kbytes
Compile displays/image/image_display_visual_test.cpp: 813496 kbytes
Compile displays/laser_scan/laser_scan_display_visual_test.cpp: 813408 kbytes
Compile displays/odometry/odometry_display_test.cpp: 813300 kbytes
Compile displays/pose_covariance/pose_with_covariance_display_visual_test.cpp: 812036 kbytes
Compile displays/range/range_display_visual_test.cpp: 811740 kbytes
Compile displays/pose/pose_display_visual_test.cpp: 811700 kbytes
Compile displays/point/point_stamped_display_visual_test.cpp: 811404 kbytes
Compile displays/illuminance/illuminance_display_visual_test.cpp: 810560 kbytes
Compile displays/relative_humidity/relative_humidity_display_visual_test.cpp: 810400 kbytes
Compile displays/temperature/temperature_display_visual_test.cpp: 810260 kbytes
Compile displays/fluid_pressure/fluid_pressure_display_visual_test.cpp: 810196 kbytes
Compile displays/wrench/wrench_stamped_display_visual_test.cpp: 809900 kbytes
Compile displays/pointcloud/point_cloud2_display_test.cpp: 802292 kbytes
Compile displays/tf/tf_display_visual_test.cpp: 798164 kbytes
Compile displays/axes/axes_display_visual_test.cpp: 798048 kbytes

PR upcoming with all of those changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants