Skip to content

Visualizing how different agents perceives their environment algorithms in Reinforcement Learning

License

Notifications You must be signed in to change notification settings

YaadR/Visualizing_RL

Repository files navigation

Visualizing_RL

Visualizing how different agents perceive their environment in the game Snake: Algorithms in Reinforcement Learning

Authors

Snake

The snake game environment is a visualization tool to evaluate different RL algorithms on the game Snake. the algorithms need to learn how to guide the snake to the food without hitting the wall or eating itself (self-loop). these algorithms do that by using image processing to create an input vector of values used in determining the best next step for the snake to play. each time the snake eats the food the algorithm gets a reward. the input vector has 3 values. each value defines the next step (forward, left, right). this is defined differently in each algorithm using state/action values.

Project setup

  1. Clone Repository

    git clone https://github.com/YaadR/Visualizing_RL.git
    cd Visualizing_RL
  2. Create Virtual Environment

    python3 -m venv venv
  3. Activate Environment

    # Linux/MacOS
    source venv/bin/activate
    # Windows
    venv/Scripts/activate
  4. Install requirements

    pip install -r requirements.txt
  5. Run Project

    python3 Ver1\main.py
    

Visualization solutions:

  1. Heatmap
  2. Certainty Arrows
  3. Neural network weights visualization (where NN is used)
  4. Certainty Bar (Entropy based)
  5. State Activation Layer

Reinforcement algorithms & concepts

Agent State Value

  • Value-based: state value
  • Model-based
  • off policy
  • online

RL Algorithm: $$V(s_t)' = V(s_t) + \alpha* \left[ R_{t+1} + (1-s_{t->terminal})(\gamma* V(s_{t+1}) - V(s_t) \right)]$$

Agent Action Value

  • Value-based: action value
  • Model-free
  • off policy
  • online

RL Algorithm: $$Q(s_t)' = R_{t+1} + (1-s_{t->terminal})(\gamma* Max(Q(S_{t+1},A_{t+1})))$$

Agent Policy

  • Policy-based
  • Model-free
  • on policy
  • online

RL Algorithm: Critic: $$A_{\text{critic}}(s_{t})' = R_{t} + (1-s_{t->terminal}) ( \gamma* V(s_{t+1}) - V(s_{t}) )$$

Actor: $$\theta_{\text{actor}} = \nabla_{\theta_{\text{actor}}} \log(\pi_{\theta_{\text{actor}}}(a_{t}|s_{t})) A_{\text{critic}}(s_{t})$$

Training - Stability, Mean & STD - 20 Rounds

Algorithms compete in an Arena

Arena

Additional Notes

Please follow the project steps carefully and ensure that all dependencies are correctly installed before running the solution.

Acknowledgments

The basis for this project is inspired by Patrick Loeber for his work in Teach AI To Play Snake - Reinforcement Learning Tutorial With PyTorch And Pygame

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute it according to the terms of the license.

About

Visualizing how different agents perceives their environment algorithms in Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages