the goal is to predict the direction of travel (in camera frame) from provided dashcam video. (yaw and pitch, fortunately, no roll)
Commaai's repo provides 10 videos. Every video is 1min long and 20 fps.
5 videos are labelled with a 2D array describing the direction of travel at every frame of the video with a pitch and yaw angle in radians.
5 videos are unlabeled. It is your task to generate the labels for them.
The example labels are generated using a Neural Network, and the labels were confirmed with a SLAM algorithm.
You can estimate the focal length to be 910 pixels.
Extending the goals and making myself a functional device that can meke my car drive itself.
what are the things I have to consider when writing code (still thinking)
So its a lot of classes I have to think some clever way to get it down.
They will evaluate our mean squared error against our ground truth labels. Errors for frames where the car speed is less than 4m/s will be ignored. Those are also labelled as NaN in the example labels.
commaai's repo includes an eval script that will give an error score (lower is better). You can use it to test your solutions against the labelled examples. They will use this script to evaluate your solution.
I am thinking of using some kind of optical flow model and rather than doing some kind of image stabilization or something I'll make it yield Yaw and Pitch of the moving vehicle.
Adding details about the architecture soon!
I'm gonna keep it short, sweet and to the point
So the architecture used was taken from this reseach paper
it's ConvNets again !! predicting stuff like optical flows is not easy and surely you can not do it with a single input image.
A straightforward step is to create two separate, yet identical processing streams for the
two adjacent frames and to combine them at a later stage (after 3 convs in this case).
In the research paper to concatenate the outputs for the convnets, they used "CORRelation layer"
but I don't think it makes a lot of difference.
after a bunch of ConvNets, it goes through a refinement layer the output for the above architecture
is the input for the refinement layer!
Labelled dataset [by comma]
Unlabeled test dataset [by comma]
Eval script [by comma]
Models and training script
Setup
Pretrained weights
what the user sees (software)
what the user sees (webpage)
segmentation
- Visualizing the data
- MaskFlow net
- FlowNet corr (not as good as I thought)
- Gma
- Training the model( on azure cause i have .edu email :) ) PS. Azure is useless !
- SLAM
- Write utility functions ( done for flownet working on GMA )
- Build and Deploy with QT5 in the pedal repo.
- Update README
- Segment comma 10k dataset
- Pilotnet
- Implement the research papers from george and do some viz on it use future images for pred lines (yaw and pitch)
- Depth_net
I have to deploy it and retrain it on new data and keep on doing that !
for now I'm not doing it in real time or with time i will make this thing work with carla
I'm so lazy to compelete the code. If there is anyone to compelete it for me go on !!
- you can monitor the training process with tensorboard:
tensorboard --port=PORT --logdir=pretrained
- pretrained model is a little too heavy for github, uploading on google drive
https://drive.google.com/file/d/1kxpD8DmL-CQIB02zxah_-BIoM6spcBJF/view?usp=sharing
- training script for flownetCorr is here
- python train_flownetcorr --help (for all the arguments and folder locations)
- the training loop is in the 'train' function.
- the validation loop is in the 'validation' function.
- there are relevent comments before every piece of code so it is not that tough to identify and change stuff.
- it uses MSE loss that is the squared of the mean of the losses through the batchsize.
- FlownetCorr model is here
-
adding soon be patient!