Deep RL Zoo

A collection of Deep Reinforcement Learning algorithms implemented with Pytorch, strongly based on OpenAI's Spinning Up.

The collection is divided into two sets:

Single-agent methods:
- Consisting of DQN, VPG (i.e. REINFORCE), PPO, TRPO, DDPG, SAC.
- Training & testing environment: Gymnasium, a maintained fork of OpenAI's Gym.
Multi-agent methods:
- Consisting of MADDPG.
- Training & testing environment: Multi Particle Environments (MPE) - PettingZoo.

Setup

The project is running on Python 3.11. To install dependencies, run the command

pip install -r requirements.txt

Running experiment

Each experiment with default setting can be run directly as

python zoo/single/sac.py

or can be run through run.py

python -m run sac

The latter enables running n experiments with different seeds (0, ..., n-1) at once. For example, to perform 5 experiments with SAC agent (with default settings), run the command

python -m run sac -n 5

To customize experiment settings, check out each algorithm file for more detailed. For example, here are the arguments used in SAC

usage: sac.py [-h] [--env ENV] [--exp-name EXP_NAME] [--seed SEED]
              [--hidden-sizes HIDDEN_SIZES [HIDDEN_SIZES ...]] [--lr LR]
              [--epochs EPOCHS] [--steps-per-epoch STEPS_PER_EPOCH]
              [--max-ep-len MAX_EP_LEN] [--buffer-size BUFFER_SIZE]
              [--batch-size BATCH_SIZE] [--start-step START_STEP]
              [--update-every UPDATE_EVERY] [--update-after UPDATE_AFTER]
              [--gamma GAMMA] [--tau TAU] [--ent-coeff ENT_COEFF]
              [--adjust-ent-coeff] [--ent-coeff-init ENT_COEFF_INIT]
              [--ent-target ENT_TARGET] [--test-episodes TEST_EPISODES] [--save]
              [--save-every SAVE_EVERY] [--render] [--plot]

Soft Actor-Critic

optional arguments:
  -h, --help            show this help message and exit
  --env ENV             Environment ID
  --exp-name EXP_NAME   Experiment name
  --seed SEED           Seed for RNG
  --hidden-sizes HIDDEN_SIZES [HIDDEN_SIZES ...]
                        Sizes of policy & Q networks' hidden layers
  --lr LR               Learning rate for policy, Q networks & entropy coefficient
                        optimizers
  --epochs EPOCHS       Number of epochs
  --steps-per-epoch STEPS_PER_EPOCH
                        Maximum number of steps for each epoch
  --max-ep-len MAX_EP_LEN
                        Maximum episode/trajectory length
  --buffer-size BUFFER_SIZE
                        Replay buffer size
  --batch-size BATCH_SIZE
                        Minibatch size
  --start-step START_STEP
                        Start step to begin action selection according to policy
                        network
  --update-every UPDATE_EVERY
                        Parameters update frequency
  --update-after UPDATE_AFTER
                        Number of steps after which update is allowed
  --gamma GAMMA         Discount factor
  --tau TAU             Soft (Polyak averaging) update coefficient
  --ent-coeff ENT_COEFF
                        Entropy regularization coefficient
  --adjust-ent-coeff    Whether to enable automating entropy adjustment scheme
  --ent-coeff-init ENT_COEFF_INIT
                        Initial value for automating entropy adjustment scheme
  --ent-target ENT_TARGET
                        Desired entropy, used for automating entropy adjustment
  --test-episodes TEST_EPISODES
                        Number of episodes to test the deterministic policy at the
                        end of each epoch
  --save                Whether to save the final model
  --save-every SAVE_EVERY
                        Model saving frequency
  --render              Whether to render the training result
  --plot                Whether to plot the training statistics

Plotting results

usage: plot.py [-h] [--log-dirs LOG_DIRS [LOG_DIRS ...]] [-x [{epoch,total-env-interacts} ...]] [-y [Y_AXIS ...]]
               [-s SAVEDIR]

Results plotting

optional arguments:
  -h, --help            show this help message and exit
  --log-dirs LOG_DIRS [LOG_DIRS ...]
                        Directories for saving log files
  -x [{epoch,total-env-interacts} ...], --x-axis [{epoch,total-env-interacts} ...]
                        Horizontal axes to plot
  -y [Y_AXIS ...], --y-axis [Y_AXIS ...]
                        Vertical axes to plot
  -s SAVEDIR, --savedir SAVEDIR
                        Directory to save plotting results

Testing policy

Result policy can be tested via the following command

python -m run test_policy --log-dir path/to/the/log/dir

where path/to/the/log/dir is path to the log directory, which stores model file, config file, etc. For more details, check out the following

usage: test_policy.py [-h] --log-dir LOG_DIR [--eps EPS] [--max-ep-len MAX_EP_LEN]
                      [--render]

Policy testing

optional arguments:
  -h, --help            show this help message and exit
  --log-dir LOG_DIR     Path to the log directory, which stores model file, config file,
                        etc
  --eps EPS             Number of episodes
  --max-ep-len MAX_EP_LEN
                        Maximum length of an episode
  --render              Whether to render the experiment

Some results

References

[1] Josh Achiam. Spinning Up in Deep Reinforcement Learning. SpinningUp2018, 2018.
[2] Richard S. Sutton & Andrew G. Barto. Reinforcement Learning: An Introduction. MIT press, 2018.
[3] Vlad Mnih, et al. Playing Atari with Deep Reinforcement Learning, 2013.
[4] Vlad Mnih, et al. Human Level Control Through Deep Reinforcement Learning. Nature, 2015.
[5] Hado van Hasselt, Arthur Guez, David Silver. Deep Reinforcement Learning with Double Q-learning. AAAI16, 2016.
[6] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas. Dueling Network Architectures for Deep Reinforcement Learning. arXiv preprint, arXiv:1511.06581, 2015.
[7] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov. Proximal Policy Optimization Algorithms. arXiv preprint, arXiv:1707.06347, 2017.
[8] John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel. Trust Region Policy Optimization. ICML'15, pp 1889–1897, 2015.
[9] John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel. High-Dimensional Continuous Control Using Generalized Advantage Estimation. ICLR 2016.
[10] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller. Deterministic Policy Gradient Algorithms. JMLR 2014.
[11] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra. Continuous control with deep reinforcement learning. ICLR 2016.
[12] Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine. Reinforcement Learning with Deep Energy-Based Policies. ICML, 2017.
[13] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. NIPS 2017.
[14] Eric Jang, Shixiang Gu, Ben Poole. Categorical Reparameterization with Gumbel-Softmax. ICLR 2017.
[15] Chris J. Maddison, Andriy Mnih, Yee Whye Teh. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. ICLR 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
common		common
data		data
some-results		some-results
zoo		zoo
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep RL Zoo

Setup

Running experiment

Plotting results

Testing policy

Some results

References

About

Releases

Packages

Contributors 2

Languages

trunghng/deep_rl_zoo

Folders and files

Latest commit

History

Repository files navigation

Deep RL Zoo

Setup

Running experiment

Plotting results

Testing policy

Some results

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages