This thesis expands on the work of Randlov and Alstrom, using reinforcement learning for bicycle self-stabilization with robotic steering. The research included Deep Deterministic Policy Gradient (DDPG) algorithm training on virtual environments followed by simulations to assess its results. Furthermore, hardware testing was also conducted on Arizona State Universitys RISE lab Smart bicycle platform for testing its self-balancing performance.
Achievement : Balancing time of 12-15 seconds was achieved on Smart Bicycle hardware system. Further improvements in regard to model training and hardware testing were also worked upon and need to be tested on the hardware system.