Skip to content

Model Predictive Path Integral Controller (MPPI)

Brian Goldfain edited this page May 25, 2017 · 3 revisions

This tutorial assumes you have CUDA and CNPY installed according to the README instructions, compiled the AutoRally software on your machine including, and are comfortable running the state estimator in simulation or on a physical robot.

Introduction

The model predictive path integral (MPPI) controller is a novel approach for autonomous vehicle control based on stochastic sampling of trajectories. The method is derivative free, and can handle complex non-linear dynamics and cost functions, which makes it ideal for performing aggressive maneuvers with the AutoRally vehicle. For a high level overview of the algorithm, and our latest results see the YouTube videos:

https://youtu.be/1AR2-OHCxsQ

https://youtu.be/f2at-cqaJMM

For a detailed overview see the related papers. If you use this code in your research please cite one of the related papers.

Related Papers:

Information Theoretic MPC for Model-Based Reinforcement Learning. Williams, G., Wagener, N., Goldfain, B., Drews, P., Rehg, J. M., Boots, B., & Theodorou, E. A. IEEE International Conference on Robotics and Automation (2017)

Aggressive Driving with Model Predictive Path Integral Control. Williams, G., Drews, P., Goldfain, B., Rehg, J. M., & Theodorou, E. A. IEEE International Conference on Robotics and Automation (2016)

Quickstart

You can run the controller in stand alone mode with the command:

roslaunch autorally_control path_integral_nn.launch

The _nn suffix denotes that we're launching the controller using a neural network dynamics model to predict and propagate the dynamics. You should see a window pop up with an overhead view of a track. The vehicle's position is the center of the image, and this should follow along the centerline of the track.

If you receive a invalid device function error, then you may need to edit the path integral CMakeLists.txt. Find the following line in the CMakeLists:

-arch=sm_35;

and replace the 35 with your GPU's compute capability. You can find the compute capability of your GPU here: https://developer.nvidia.com/cuda-gpus. For example if your compute capability is 5.0, replace the 35 with 50.

Running in Simulation with Gazebo

Running the controller with Gazebo is just like running it in stand-alone mode, except now we'll receive updates from the state estimator instead of just propagating the dynamics forward in time. First, launch the Gazebo simulation of the track:

roslaunch autorally_gazebo autorallyTrackGazeboSim.launch

Next launch the state estimator before moving the simulated robot.

roslaunch autorally_core stateEstimator.launch

Included in the repository is a map defined for the simulation track where the origin is the spawn point of the vehicle in the simulation. Therefore, it's important that you do not move the vehicle before starting the state estimator. Once the state estimator is running, move the vehicle around manually and check that that state estimator is converged. After the state estimator is converged, launch the controller:

roslaunch autorally_control path_integral_nn.launch

Be sure to check that the controller is running fast enough, run:

rostopic hz /mppi_controller/chassisCommand

and make sure that the controller is running at (or extremely close to) 40 hz. Running the controller in the real world is exactly the same, except that you'll need to make a track cost map to fit your specific track.

Parameters

The path integral launch file has the following parameters:

Configuration/Debug Parameters

pose_estimate: [string] pose estimate topic to subscribe to

path_topic: [string] topic name for the planned path by mppi

debug_mode: [bool] whether to run in debug mode

MPPI Parameters

hz: [int] control publishing frequency per second

num_timesteps: [int] number of timesteps to look ahead

x_pos: [double] initial x position

y_pos: [double] initial y position

heading: [double] initial heading

model_path: [string] absolute path to dynamics model parameters

gamma: [double] mppi temperature parameter

num_iters: [int] number of optimization iterations per timestep

init_steering: [double] steering value to initialize new steering controls to

init_throttle: [double] throttle value to initialize new throttle controls to

steering_std: [double] standard deviation in the steering control channel

throttle_std: [double] standard deviation in the throttle control channel

max_throttle: [double] maximum throttle command

min_throttle: [double] minimum throttle command

max_steering: [double] maximum steering command

min_steering: [double] minimum steering comand

Cost Function Parameters

desired_speed: [double] target speed for the controller

speed_coefficient: [double] weight on speed cost

track_coefficient: [double] weight on track cost

max_slip_angle: [double] maximum allowable slip angle before killing trajectory

slip_penalty: [double] weight on square of the slip angle

track_slop: [double] threshold below which the track cost gets set to zero

crash_coeff: [double] penalty for crashing

steering_coeff: [double] control cost for steering

throttle_coeff: [double] control cost for throttle

map_path: [string] Absolute path of the map file

Dynamic Reconfigure Variables

Some of these variables can safely be changed on the fly using dynamic reconfigure. These are: max_throttle, desired_speed, speed_coefficient, track_coefficient, max_slip_angle, slip_penalty, crash_coefficient, track_slop, steering_coeff, throttle_coeff.

Once you have the MPPI controller running, try launching rqt_reconfigure and changing some of the parameters to get different behavior. For instance, increasing the target speed will make the vehicle go faster. The dynamics of the simulator once the vehicle starts sliding are significantly less forgiving than the vehicle dynamics in the real world, but the vehicle should still be relatively stable for a target speed of 9 m/s. (On the real platform it is stable up to an 11 m/s target).

Cost Map

The cost map is what defines the boundaries of the track for the MPPI controller. A value of 0 defines the center of the track, and a value of 1 indicates off the track. If the track cost is larger than 1 then the MPPI controller receives an impulse cost (the crash coefficient term), this is designed to enforce a type of hard boundary constraint so that the vehicle won't violate the track boundaries. The cost map is a simple text file with the following format:

first entry: minimum x value second entry: maximum x value third entry: minimum y value fourth entry: maximum y value fifth entry: resolution of the grid. The total width of the grid is (maximum x - minimum x)*resolution and similarly for the height. all remaining entries: the grid values in row-major order

A pre-computed map is included in the repository for the Gazebo simulation track.

Dynamics Model

We use a neural network to represent the dynamics. The neural network that we use is trained on the real-world vehicle data, and then fine tuned to the Gazebo simulation data. The neural network parameters are saved in a numpy archive (.npz) file with the following format:

dynamics_W1: input to first hidden layer matrix

dynamics_b1: first hidden layer bias

...

...

dynamics_Wn: hidden to hidden matrix for the nth layer

dynamics_bn: nth hidden layer bias

...

The size of the neural network is a template parameter in the C++/CUDA code. You can try training your own neural network using one of many popular neural network libraries (we use Lasagne: https://github.com/Lasagne/Lasagne), and then saving the parameters in a numpy archive with the above format. You just need to change the template parameters to match your network, and point the model path in the path integral launch file to the correct location. Note that the model parameters need to be saved as double precision floating point numbers in order for them to be read in correctly.