The goal of the OPTIMA project is to take advantage of FPGA-based High Performance Computing (HPC) systems to optimize academic and industrial softwares and applications. It also aims at providing guidelines to ease future development of FPGA applications.
In the context of Cyberbotics, the goal is to show that running a Webots robot simulation which uses deep learning in its controllers on FPGA-based systems is much faster than on CPU or GPU. This repository summarizes the work performed on the Jumax machine to adapt a deep-learning robot simulation for FPGA.
When cloning this repository, don't forget to initialize the submodules.
git clone https://github.com/cyberbotics/optima.git
git submodule update --init --recursive
To install Webots on Jumax, please refer to the 2 following pages: get access to Jumax and compile Webots on Jumax.
This page explains how to start the simulator: start Webots.
To understand how DFEs (running the FPGAs) on Jumax work, please head to the following page: basics of DFE applications.
To code your own applications on Jumax, these 2 pages explain the required workflow: start MaxIDE and compile with MaxCompiler
The first contribution consists in creating a framework which allows to create simple neural networks: multilayers perceptrons. The details of its implementation in C++ is given here: Creation of a Deep Learning Frameowrk in C++.
The source code of the implementation is located in the MLP-train-framework
directory. You can compile it using the following command
cd MLP-train-framework
mkdir build
g++ -O3 -Imnist src/MLP-train-framework.cpp -o build/MLP-train-framework
Then you can run the framework using ./build/MLP-train-framework
. You can choose any structure you want and it will train your network on the MNIST dataset.
The MLP-forward
directory contains the files for the comparison of the execution time of forward propagation between CPU and FPGA.
Implementations are explained here: MLP Forward Propagation on CPU: Tests and Results & MLP Forward Propagation on DFE: Structure. Final results can be found here: MLP Forward Propagation on DFE: Results and Optimization.
Respective source codes are located in MLP-forward/CPU
and MLP-forward/DFE
. Informations on how to run the DFE projects can be found in Run DFE Applications with MaxIDE. The CPU codes can be compiled with the following commands.
cd MLP-forward/CPU
mkdir build
g++ -O3 -fopenmp -Imnist src/fixed-forward-test-CPU.cpp -o build/fixed_forward_test-CPU
Note that the source codes are optimized for the Jumax machine, in particular concerning the multi-threading optimization.
The CNN-autonomous-car
directory contains the Webots project of the self-driving car using a convolutional neural network. Its description can be found here.
The scripts to compute the trajectory planning and the training of the CNN are located in CNN-autonomous-car/scripts
. The trained model parameters are contained in offsetModel.pt
.
The training and testing worlds are both located in CNN-autonomous-car/worlds
.
The different controller source codes are located in CNN-autonomous-car/controllers
. The available controllers are the following.
- training_track_driving: this controller is used to follow the train track using the trajectory planning data.
- CNN_autonomous_car_cpu_float: this controller contains the naive version of the neural network with floating point represention to drive the car.
- CNN_autonomous_car_cpu_fixed: this controller is the optimized CPU version of the neural network with fixed point representation and multithreading to drive the car.
- CNN_autonomous_car_fpga: this controller uses the cnn_dfe library to run the neural network on a DFE. This controller has 3 different loop optimization modes. The mode must be passed through the controllerArgs field in
CNN-autonomous-car/worlds/autonomous_car_test.wbt
. - CNN_autonomous_car_fpga_optimized: this controller uses the cnn_dfe library to run the neural network on a DFE. This controller implements only one step loop of CNN_autonomous_car_fpga. It is the final, most optimized FPGA version of the controller, described in the Deliverable 2 results.
The DFE optimization of the neural network is compiled in a shared library: CNN-autonomous-car/libraries/cnn_dfe
. The corresponding DFE kernels and manager are in: CNN-autonomous-car/libraries/cnn_dfe/src
.
You can select the controller to drive the car by editing the controller field in CNN-autonomous-car/worlds/autonomous_car_test.wbt
and then run Webots on Jumax.
A GPU is now available on jumaxbuild1. It is recommended to run Webots on jumaxbuild1 and the controller as extern on jumax-cpu. Detailed instructions can be found on the following page: How to run the most optimized car simulation.