Runaway is a Reinforcment Learning environment developed in pygame. The object of the game is to avoid being captured by the evil minion that is chasing after you for as long as possible. A player wins the game if the minion is avoided for 1000 timesteps (roughly 20 seconds), otherwise the game is lost. Additionally, as part of this project I demonstrate how tabular Q-learning can be utilized to learn an optmial policy to play the game.
A short video presentation can be found here discussing how I trained an agent using Q-learning to play the game:
The state of the environment is represented as a vector of the 4 numbers:
- the x posistion of the player
- the y position of the player
- the x position of the minion
- the y position of the minion
The player avoids the minion chasing after by moving around the 2D game board. The player can take the following actions:
- Move up (up keyboard arrow)
- Move down (down keyboard arrow)
- Move left (left keyboard arrow)
- Move right (right keyboard arrow)
- Stay put (no action)
If the player moves into a wall the player remains in the same position.
Rewards are given as follows:
- 1 reward is given for each timestep the minion is alive
- 1000 rewards are given if the player wins (survives for 1000 timesteps without being captured)
- -100 rewards are given if the player is captured, ie. the player loses
The Minion chasing the user follows a policy that always take the move the minimizes the distance between itself and the player.
Built into the environment is a trainable Q agent that can learn an optimal policy to play the game. Pretrained models can be found in models/
I trained the agent using a decaying epsilon greedy policy where the start epsilon was set at 1 and every episode epsilon was decayed at a rate of 0.995 until it reached a minimum of 0.001 where it stay constant for the rest of training. I trained the agent for a total of 5000 episodes. Below is a plot showing the 50 episode moving average of the rewards obtained during training, an animation of the agent following the policy learned, and an animation of the agent following a completely random policy for comparison:
The agent converges when the rewards obtained consistantly reaches above 1250. From the plot, the agent converges after about 2000 episodes.
To get started run the following code to download the repo, create a virtual env, install the dependencies, and start up a human controlled game:
git clone https://github.com/loftiskg/runaway.git
cd runaway
python -m venv env
source env/bin/activate
pip install -r requirements.txt
python play.py
To train a Q learning agent to play the game run the following:
# this will save the Q values in the models directory with the name Q_model_e10000_epsilon0.001_1.pkl
python train.py --episodes 5000 --epsilon 1 --epsilon_decay .995 --min_epsilon .001 --save_model --save_plot --save_rewards --suffix "decay" --print_every 100
This will recreate the results seen above
To render the trained agent run the following command:
python play --agent q --agent_model_path models/Q_model_e5000_epsilon1.0_decay.pkl
src/
contains all the code that defines the enviornment and agentsmodels/
contains pretrained models and where models are saved after training using the commandline toolplots/
contains training rewards plots saved after training using the commandline toolrewards/
contains csvs of rewards at each episode of training when using the commandline toolimages/
contains images used in markdown filesnotebook.ipynb
contains a demo of training an agent in a notebook.play.py
a commandline tool that starts up a human or trained agent controlled games. For more info on how the CLI works runpython play.py --help
at the commandline to learn moretrain.py
a commandline tool that trains and agent using Q learning. For more info on how the CLI works runpython train.py --help
at the commanline to learn more
Feel free to reach out to me on LinkedIn