Skip to content

amirdy/smart_snake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Smart_Snake

This is a Reinforcement Learning project. In this project, Agent(snake) learns how to play the snake game1. The game board is 12×12. The snake moves in the 10×10 area and eats the food. Eating the food increases the length of the snake. The snake must learn how to eat the food without running into the screen border or itself.

The learning algorithm is DQN.

Average Test scores : 20
Best achieved score : 49

Preview    Algorithm    Network    State    Hyperparameters    Results    References    Useful Resources

Preview

gif1 gif2
score = 49 score = 48
gif1 gif2 gif3
score = 46 score = 46 score = 43

Algorithm

DQN Pseudo Code (https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
Q = Qθ = Action-Value Function = Policy network
Q^ = Qθ- = Target Function = Target network

Note:

The implementation has some differences with the above algorithm:

  1. The training (calculating loss and updating the weights) doesn't apply to the first 2000 steps2.
    • Because there are not enough samples in the replay memory2.
  2. Target Network is updated every C episode (not every C step)3.
  3. I supposed that st = xt.

Network

Input Data :

  (Batch_size, 8)

Layers :

  FC(1024)ReLUFC(1024)ReLUFC(512)ReLUFC(4)

State

st :

  The frame of the game after t transitions. It is converted to a 12×12 np array.

Example:
Frame
(12 × 12) Np array

Φ(st) :

  Based on the Assignment4 of the Artificial Intelligence (CS 440/ECE 448) course from University of Illinois at Urbana–Champaign, 8 features were extracted from the frame as below:

  [Adjoining_wall_x, Adjoining_wall_y, food_dir_x, food_dir_y, Adjoining_body_top, Adjoining_body_bottom, Adjoining_body_left, Adjoining_body_right]4

      equation

      equation

      equation

      equation

      equation

      equation

      equation

      equation

Φ(st) for the previous example:

   equation

Hyperparameters

Some initializations have been adopted from this paper and this site.

  • C:

    • 10
  • γ:

    • 0.99
  • Batch size :

    • 128
  • actions :

    • (Left, Right, Up, Down) ~ (0, 1, 2, 3)
  • Rewards :

    • (RewardFood , RewardLoose , RewardMove) ~ (100, -100, -0.1)
  • N (Replay Memory Size) :

    • 50000
  • M (Number of Episodes) :

    • 30000
  • Learning rate :

    • 0.001
  • Optimizer :

    • RMSprop
  • Loss :

    • MSELoss
  • Epsilon Greedy :

    • ε decreases linearly from 1(εmax) to 0.0001(εmin) with step 0.00001(∆ε). In other words, after 100000 steps the ε will be 0.0001 for the rest of the training2.

Results

Plots:
Train
Test

Notices :

   Training finished in ~ 209 Minutes on Tesla V100-SXM2-16GB (using Google Colab Pro).
   Test Result:   Mean(scores) : 20.257  | Std(scores) : 6.50

References

[1] Wikipedia - Snake (video game genre)

[2] https://www.diva-portal.org/smash/get/diva2:1342302/FULLTEXT01.pdf

[3] PyTorch - REINFORCEMENT LEARNING (DQN) TUTORIAL

[4] CS440/ECE448 Spring 2019 Assignment 4: Reinforcement Learning and Deep Learning (UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGN)

Useful Resources

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

AI learns to play SNAKE using Reinforcement Learning (Square Robots)

How to automate Snake using Reinforcement Learning (DeKay Arts)

https://github.com/YuriyGuts/snake-ai-reinforcement/

https://github.com/benjamin-dupuis/DQN-snake

About

Playing the snake game using DQN.

Resources

Stars

Watchers

Forks