This project is a Pytorch implementation of several variants of the Deep Q Learning (DQN) model. It is based on the material provided by Udacity's Deep Reinforcement Learning Nanodegree. The objective is to use one of the Unity ML-Agents libraries to demonstrate how different DQN implementations can be coded, trained and evaluation.
The code structure builds from the Nature DQN, and incrementally implements 3 modifications, in order: Double Q Learning, Dueling Networks and Prioritized Experience Replay. The articles for each one of these implementations can be found at
Although the code can be used in any operating system, the compiled versions of the Unity ML-Agents environment used are only available to MAC (with graphics) and Linux (headless version, for faster training). To download the Linux version with graphics or the Windows versions, please use the links below (provided by Udacitys Nanodeegre):
- It is recommended to use mini-conda to manage python environments. In order to install the dependencies the initial step would be:
conda create --name dqn-pytorch python=3.6
source activate dqn-pytorch
- The necessary packages to run the code can be obtained by cloning and installing Udacity's Nanodegrees repository (plus, the repo is lots of fun to anyone wanting to explore more projects related to reinforcement learning)
git clone https://github.com/udacity/deep-reinforcement-learning.git
cd deep-reinforcement-learning/python
pip install .
- To use jupyter notebooks or jupyter lab properly, it is important to create an ipython kernel.
python -m ipykernel install --user --name dqn-pytorch --display-name "dqn-pytorch"
- Before running code in a notebook, change the kernel to match the
dqn-pytorch
environment by using the drop-downKernel
menu.
The environment consists of a robot surround by a boxed enclosure filled with yellow and blue bananas At each time step, it has four actions at its disposal:
0
- walk forward1
- walk backward2
- turn left3
- turn right
The state-space has 37
dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. A reward of +1
is provided for collecting a yellow banana, and a reward of -1
is provided for collecting a blue banana.
The environment is considered solved when an average score above 13.0 is obtained for the last 100 episodes.
To get started with the code, the first step is to load the Unity-ML agent's environment. It is important to note that the path must be adjusted to the location of the environment file in your system. The environment is organized around brains that represent each controllable agent. In the banana environment, it suffices to use the first brain. The initial code would be:
from unityagents import UnityEnvironment
env = UnityEnvironment(file_name="environments/Banana.app")
brain_name = env.brain_names[0]
brain = env.brains[brain_name]
The next step is to load one of the implemented agents and corresponding training class. For the banana environment, the state size must be 37, and action size 4. The training setup must include the number of episodes, and the values for the epsilon and beta parameters evolution. An example with the values used in the trained models and the Prioritized Replay model is:
from dqn import PriorAgent, PTraining
agent = PriorAgent(state_size=37, action_size=4, seed=0)
training_setup = PTraining(n_episodes=2000, eps_start=1, eps_end=0.01, eps_decay=0.995, beta_start=0.4, beta_inc=1.002)
To train the agent and get the scores during training, use the train function of the training class.
scores = training_setup.train(agent, env, brain_name, track_every=2, plot=True, weights='final.pth',success_thresh=13.)
The class receives as inputs:
- the agent
- the environment,
- the brain name
- track_every - the number of steps between the tracking of the training
- plot - wether or not the tracking is visual (with an evolution plot) or only informative (with prints)
- success_thresh - The threshold for the moving average of the last 100 runs. When it is conquered, the training stops and the weights are saved in the models folder
- weights - The name of the weights file where the model will be saved
Once the scores is saved, you can save the training with a name and description using the Benchmark class. To do so, just do as the code bellow.
from dqn import Benchmarks
benchs = Benchmarks()
benchs.save_score('Final Prioritized Replay', scores, 'Prioritized replay implementation, with dueling model and Double DQN, the impletation trained for 2000 episodes'))
To check all available saved trainings, check the Benchmarks section. To see a trained model play, just load the weights for the agent with the load_weights function, and use the play function of the training class.
agent = PriorAgent(state_size=37, action_size=4, seed=0)
agent.load_weights('final.pth')
scores = PTraining().play(agent, env, brain_name)
Below is a comparison with the Prioritized Replay model, of an untrained agent, with an agent trained for 2000 steps. Check how the trained model is able to search for yellow bananas while avoiding blue ones
Untrained Model | Trained Model |
The folder system in the code is structured as:
-
benchmarks - Training scores and description of each model already trained
-
dqn - Main library, with different implementations of the DQN model
-
models - Saved weights of the trained models
-
images - Saved images of results
-
Navigation.ipynb - Jupyter Notebook with code samples
The DQN library is organized in classes as follows
- Model Modules - Modules to train and use each one of the implementations
- Benchmarks - Class to load and display the saved training scores
Each model module is organized as
-
Agent - The agent implementation, responsible to interact and learn with the environment
-
Model - Neural Net implementation in PyTorchh of the DQN architecture
-
Training - Convenience class to handle training and tracking of the agent
For a description of the implementation of the most complex variant, see the Report file.
Training using the other implemented models is very similar to the instructions in the Training and Playing. The available models, corresponding classes and code examples are listed bellow
-
Nature DQN
The original DQN proposedfrom dqn.nature import DQNAgent, NatureTraining agent = DQNAgent(state_size=37, action_size=4, seed=0) training_setup = NatureTraining(n_episodes=2000, eps_start=1, eps_end=0.01, eps_decay=0.995) scores = training_setup.train(agent, env, brain_name, track_every=10, plot=True, weights='dqn.pth', success_thresh=13.0)
-
Double DQN
DQN with modification to implement double q learningfrom dqn.double import DDQNAgent, DoubleTraining agent = DDQNAgent(state_size=37, action_size=4, seed=0) training_setup = DoubleTraining(n_episodes=2000, eps_start=1, eps_end=0.01, eps_decay=0.995) scores = training_setup.train(agent, env, brain_name, track_every=10, plot=True, weights='ddqn.pth', success_thresh=13.0)
-
Dueling DQN
DQN with modification to implement double q learning and a dueling network architecturefrom dqn.dueling import DDDQNAgent, DuelTraining agent = DDDQNAgent(state_size=37, action_size=4, seed=0) training_setup = DuelTraining(n_episodes=2000, eps_start=1, eps_end=0.01, eps_decay=0.995) scores = training_setup.train(agent, env, brain_name, track_every=10, plot=True, weights='dddqn.pth', success_thresh=13.0)
-
Prioritized Replay
DQN with modification to implement double q learning, a dueling network architecture, and Prioritized replayfrom dqn.prioritized import PriorAgent, PTraining agent = PriorAgent(state_size=37, action_size=4, seed=0) training_setup = PTraining(n_episodes=2000, eps_start=1, eps_end=0.01, eps_decay=0.995, beta_start=0.4, beta_inc=1.002) scores = training_setup.train(agent, env, brain_name, track_every=10, plot=True, weights='priordqn.pth', success_thresh=13.0)
The 4 models implemented have trained versions saved in the models folder. Those models are named as:
- Nature DQN [1] -> dqn.pth
- Double DQN [2] -> ddqn.pth
- Dueling Double DQN [3] -> dddqn.pth
- Prioritezed Replay DQN [4] -> priordqn.pth
- Prioritezed Replay trained through 2000 steps -> final.pth
- Untraind Prioritized Replay DQN -> untrained.pth
Also, the scores for every training along with a description of the model used are saved in the benchmarks folder. The available scores are:
- DQN -> Nature DQN training
- DDQN -> Double Q learning DQN training
- DDDQN -> Dueling Network with Double Q learning DQN training
- Prioritized Replay -> Prioritized Replay (with double q learning and dueling architecture)
- Final Prioritized Replay - Prioritized architecture trained through 2000 step
- random -> Performance of a random agent
To load a specific model, just use the function load_bench from the Benchmarks class. The load class receives the name of the saved scores. To plot the scores, use the plot_bench function. This function receives the scores vector, the title of the plot
scores = benchs.load('DQN')
bench_dict = benchs.plot_bench(scores, title='Example of Loading Score', mean=100, opacity=0.5)
The plot function receives the scores vector, the title of the plot, the number of runs to use in the moving mean calculation (or None for not displaying the mean) and the opacity to use for the plotting of the scores.
To see a comparison of all the trainings, you can load a dictionary of { 'model name': [scores vector] } with the load_benchmarks function. To plot the dictionary, use the plot_benchs function
bench_dict = benchs.load_benchmarks()
benchs.plot_benchs(bench_dict, title='Models Comparison', mean=100, opacity=0.1)
For further details of the implementation of the reinforcement learning agent, the Prioritized Replay model architecture is describe with details in the Report file.
[1] Deep Q Learning Nature Paper
[2] Deep Reinforcement Learning with Double Q-learning
[3] Dueling Network Architectures for Deep Reinforcement Learning