Experiment1.py - investigating the effects of changing policy on RL agent effectiveness
- Simple RL agent implementation with a state embedder and reward computation
- Tested 3 policies: Greedy Q Policy, Epsilon Greedy Q Policy, Boltzmann Q Policy
Experiment2.py - policy Hyperparameter Tuning
Experiment3.py - Round Robin for the optimal agent based on the player it was trained against
Experiment4.py - Round Robin with Minimax Algorithm
Experiment5.py - (in progress) PPO experiment
Final results and report from experiments can be found here
Pokemon showdown environment forked from https://github.com/smogon/pokemon-showdown