What: A CNN that learned to play a video game, Atari's Breakout.
Results of training on a dusty GTX 1080 for 10 hours / overnight.
This model consists of a Convolutional Neural Network with a preprocessed frame from Breakout of a (210, 160, 3) tuple => (84, 84) grayscale down-sized frame and a linear output size of 4:
- no-op
- fire
- move left
- move right
which gets reduced down to 3 (no-op, move left, move right) because (fire) in breakout is basically a no-op.
The model uses the Adam optimizer with a logcosh, mean squared error, or huber loss function.
Python3.8: create using conda or asdf
pip install -r requirements.txt
You will also need I will also need to determine which tensorflow package if training on a GPU. pip install tensorflow-gpu==1.7.0
if you are using a GPU to train.requirements.txt
/ numpy
was recently updated and I don't know. I just don't.
-
breakout.py
: The main breakout game loop. Integrates with all of the components. -
DQNAgent.py
: The Deep Q Network Agent for learning the breakout game. -
ReplayMemory.py
: The Remembering and Replaying for the DQNAgent to learn. -
hyperparameters.py
: All of the hyperparameters -
discrete_frames.py
: Discrete frames into the model and memory. More memory footprint, more backpropogation steps. -
sliding_frames.py
: Sliding frames into the model and memory. Less memory footprint, less backpropagation steps. -
utils.py
: List of utility functions used by numerous components.
'GAME' : 'BreakoutDeterministic-v4', # Name of which game to use
# v1-4 Deterministic or Not
'DISCRETE_FRAMING' : True, # 2 discrete sets of frames stored in memory
'LOAD_WEIGHTS' : '', # Loads weights into the model if so desired
# leave '' if starting from a new model
'RENDER_ENV' : False, # shows the screen of the game as it learns
# massivly slows the training down when True
# default: False
'HEIGHT' : 84, # height in pixels
'WIDTH' : 84, # and width in pixels that the game window will get downscaled to
# defaults: 84, 84
'FRAME_SKIP_SIZE' : 4, # how many frames we skip and and how many times we
# choose an action consecutively for that many frames.
# default: 4
'MAX_EPISODES' : 12000, # defined as how many cycles of full life to end life or
# winning a round
# default: 12,000
'MAX_FRAMES' : 50000000, # max number of frames allowed to pass before stopping
# default: 50,000,000 (how many google used)
'SAVE_MODEL' : 500, # how many episodes should we go through until we save the model?
# default: whenever you want to save the model
'TARGET_UPDATE' : 10000, # on what mod epochs should we update the target network?
# default: 10000
'WATCH_Q' : False, # watch the Q function and see what decision it picks
# cool to watch
# default: False
'LEARNING_RATE' : 0.00025, # learning rate of the Adam optimizer
# default: 0.00025
'INIT_EXPLORATION' : 1.0, # exploration rate, start at 100%
'EXPLORATION' : 1000000, # how many frames we decay till
'MIN_EXPLORATION' : 0.1, # ending exploration rate
# defaults: 1.0, 1,000,000, 0.1
'OPTIMIZER' : 'Adam', # optimizer used
# default: RMSprop or Adam
'MIN_SQUARED_GRADIENT' : 0.01, # epsilon rate
# default: 0.01
'GRADIENT_MOMENTUM' : 0.95, # momentum into the gradient used
# default: 0.95
'LOSS' : 'huber', # can be 'logcosh' for logarithm of hyperbolic cosine
# or 'mse' for mean squared error
# or 'huber' for huber loss
# default: logcosh, mse, or huber
'NO-OP_MAX' : 30, # how many times no-op can be called at the beginning
# of a single episode, reduces using the same state
# at the beginning and increases variance of similar states
# default: 30 (don't set this too high or we may lose before acting!)
'SHOW_FIT' : 0, # shows the fit of the model and it's work, turn to 0 for off
# default: 0 for off
'REPLAY_START' : 50000, # when to start using replay to update the model
# default: 50000 frames
'MEMORY_SIZE' : 1000000, # size of the memory bank
# default: 1,000,000
'GAMMA' : 0.99, # integration of rewards, discount factor,
# preference for present rewards as opposed to future rewards
# default: 0.99
# 4 * 8 = 32 batch
'REPLAY_ITERATIONS' : 4, # how many irerations of replay
# default: 4
'BATCH_SIZE' : 8 # batch size used to learn
# default: 8
To start the breakout game with the DQN Agent, run python3 breakout.py
To change how the DQN Agent learns, modify hyperparameters.py
To start the demo, run python3 DQN_Testing.py
Alternatively, there is a python notebook under DQN_Testing.ipynb which renders every 6 frames.
- http://docs.python-guide.org/en/latest/starting/installation/
- https://www.makeuseof.com/tag/install-pip-for-python/
- https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf
- dennybritz/reinforcement-learning#30
- https://github.com/tokb23/dqn/blob/master/dqn.py
- https://github.com/jcwleo/Reinforcement_Learning/blob/master/Breakout/Breakout_DQN_class.py
- https://medium.com/mlreview/speeding-up-dqn-on-pytorch-solving-pong-in-30-minutes-81a1bd2dff55
- https://becominghuman.ai/beat-atari-with-deep-reinforcement-learning-part-2-dqn-improvements-d3563f665a2c
- https://github.com/keras-rl/keras-rl