Skip to content

kaneyxx/RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL

About

This repo aims to practice implementing reinforcement learning algorithms

Build up environment

conda create -n rl python=3.9

Install some dependencies

pip install setuptools==65.5.0 "wheel<0.40.0"

Then,

pip install -r requirements.txt

<Q-learning & Sarsa>


Command:

python main.py [--parameters]

Example 1 (Training):

python main.py --env "CliffWalking" --agent "Sarsa" --episode 500 --render

Example 2 (Testing):

python main.py --env "CliffWalking" --agent "Sarsa" --test "./qtable_CliffWalking_Sarsa.npy"

Parameters:

  • env : "FrozenLake", "CliffWalking", "GridWorld"
  • agent : "Q-Learning", "Sarsa", "SarsaLambda"
  • episode : How many episodes you want the agent to learn
  • lr : Learning rate
  • gamma : Discount rate
  • lambda : Decaying rate for eligibility traces (only implemented in Sarsa lambda algorithm currently)
  • epsilon : Low prob. for random action to make sure you will not only pick one action
  • slippery : Only for FrozenLake and GridWorld env, default = False
  • render : There will be a window show up if True, default = False
  • test : Test on specific table file (input file path), default = None

<Deep Q Network (DQN)>


Command:

python main.py [--parameters]

Example 1 (Training with default settings):

python main.py

Example 2 (Training with customized settings):

python main.py --episodes 500 --batch_size 64 --replace_iter 5 --use_pretrained --render

Example 3 (Testing):

python main.py --test "./dqn.pth" --render

Parameters:

  • env : 'CartPole-v0', 'CartPole-v1'
  • replay : Experience replay storage capacity
  • episodes : Episodes you want the agent to learn
  • batch_size : Sampled batch size for each step
  • lr : Learning rate
  • epsilon : Prob. for random action to make sure the agent can explore the environment
  • epsilon_decay : Epsilon decay rate (for every 20 episodes)
  • epsilon_min : Minimal epsilon
  • gamma : Discount rate for estimating future value
  • replace_iter : Update target network once every n episodes
  • use_pretrained : Load pretrained weights, default = False
  • render : There will be a window show up if True, default = False
  • test : Test on specific policy file (input file path), default = None

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages