In my previous post on self-driving cars and related Udacity Nano degree project, I was writing about the use of deep neural network for traffic sign classification. This time around, I will be using deep neural network approach in order to mimic human driving behavior (in this particular case, driving a car in a simulator).
The general idea here is to gather training data by driving the car in simulator, then train the deep neural network with that data, and in the end let the car be driven by the model generated by the deep neural network. You can find the simulator git repohere
Udacity has already offered some pre-recorded laps, but I have decided to play around with simulator myself. So I've recorded 5 lapsfor each direction of the track. Both directions are needed in order to avoid the bias of the deep neural net towards turning to the left side of the road.
The recording resulted in 36 534 captured images. Images contain captured data from three cameras on the car: left, center and right.
For the training purposes I've used only center camera which proved to be enough for getting very decent end-results. In order to make the model more general, it is advised to use all three cameras for car to be able to handle scenarios for getting back to the center of the track better.
When using multiple cameras, it is important to remember that steering angle needs to be adjusted properly with a constant for both left and right camera.
The training data also contains CSV file which contains time stamped image captures together with steering angle, throttle, break and speed.
Since the training data contains laps in both track directions, the steering angle is not biased:
80% of the data is intended to be used for training and 20% for validation. All the training data can be found here.
In order to increase the size of the training data set even more, I've flipped every image around horizontal axis.
Preprocessing of images in this case is done in the model itself since it's less expensive when it's done on GPU compared to CPU. I've cropped 50 pixels from top and 20 pixels from the bottom of every image in order to remove unnecessary noise from the images (front part of the car, sky, trees, etc.).
I've implemented deep neural network architecture from NVidia's paper: End to end learning for self-driving cars in Kerasrunning TensorFlow in the backend. It's an excellent paper and I recommend going through it if you're more into understanding the real world application of this deep neural network.
Here's NVidia's short video of the car running the same model based on architecture from above in real life:
The deep neural network itself takes input images (160 x 320 x 3). At the very beginning it contains normalization and cropping layers. This continues with 3 convolutional layers with 2x2 stride and 5x5 kernel. These are followed by additional 2 convolutional layers with no stride and 3x3 kernel. Convolution layers are connected to three fully connected layers leading to an output control value. Non-linearity is introduced between all network layers with RELU function. ADAM is used for our learning rates.
Dropout layers were not necessary since I have not noticed any significant over-fitting during the training.
Training was done on Amazon EC2 instance which was equipped with GPUs. After the training was done, the car was able to complete the track by itself:
All code can be found on my Github profile. This has been definitely the most interesting project so far on this course. My plan is to apply the learnings from this project on a real life remote controlled car and build a small racing track in my apartment where I can train it to drive itself.