This repository contains the code and paper to our (Gerrit Bartels, Thorsten Krause and Jacob Dudek) project in Deep Reinforcment Learning. We investigated benefits of Transfer Learning regarding Soft Q-Networks and Double Deep Q-Networks, as well as an alledged relationship between overfitting and negative transfer. We propose and executed an experimental setup, provide a ready-to-use implementation and identified and put forth major challenges that future research can build upon.
Currently under construction. Please try reloading until the project is finished.
As test bed we used the popular NES game "Super Mario Bros.". The game consists of 32 levels in which the player has to control Super Mario through a parkour of obstacles by choosing from 256 distinct actions. We relied on a ready-to-use implementation that can be found here.
For transfer learning we chose level 1-1 (left) as the source and level 1-2 (right) as the target domain. Below are exemplary scenes from both levels.
The agents received the game state as a normalized, rescaled 84x84 grey-scale picture and drew from a restricted action space of five actions: (1) idle, (2) move right, (3) jump right, (4) move right and throw a fire ball, (5) jump right and throw a fireball. As consecutive frames are highly correlated, we accelerated training by repeating each action over four frames and passing the corresponding states as a stacked 4x84x84 image.
The following figure visualizes our CNN backbone architecture employed in both our DDQN and SoftQN.
DDQN:
eval_video_DDQN_1-1.mp4
SoftQN:
eval_video_SOFTQ_1-1.mp4
Note that with the label "untrained" we are referring to models that have been trained from scratch on Level 1-2 without any knowledge transfer.
DDQN:
eval_video_DDQN_1-2_untrained.mp4
eval_video_DDQN_1-2_transfer_all_wr35.mp4
SoftQN:
eval_video_SOFTQ_1-2_untrained.mp4
eval_video_SOFTQ_1-2_transfer_all_wr50.mp4
Click here to get to a video presentation of our project held by Thorsten Krause.