Code base I use to help with reinforcement learning (RL) research projects.
Run using
python -m rlf.examples.train --alg ALGORITHM_NAME --env-name Pendulum-v1
Supported algorithms (to replace ALGORITHM_NAME
in the above command)
- Imitation learning (You must also specify the
--traj-load-path
argument for these commands to load the demonstrations. See "how to specify demonstrations for imitation learning?" for more information.- Generative Adversarial Imitation Learning (GAIL):
--alg gail_ppo
- Generative Adversarial Imitation Learning from Observations (GAIfO):
--alg gaifo_ppo
- Behavioral Cloning (BC):
--alg bc
- Behavioral Cloning from Observations (BCO):
--alg bco
- Soft-Q Imitation Learning (SQIL):
--alg sqil
- Generative Adversarial Imitation Learning (GAIL):
- Reinforcement learning
- Proximal Policy Optimization (PPO):
--alg ppo
- Soft Actor Critic (SAC):
--alg sac
- Deep Deterministic Policy Gradients (DDPG):
--alg ddpg
- Random Policy:
--alg rnd
- Proximal Policy Optimization (PPO):
To see the list of all possible command line arguments add -v
. For example: python examples/train.py --alg sac --env-name Pendulum-v1 --cuda False -v
. Command line arguments are added by the algorithm or policy. See examples here and here. See learning curves for these algorithms below.
- Specify the name of your algorithm using
--env-name
. - If they are registered through
gym.envs.registration
it will work automatically throughgym.make
. - See this page for information about specifying custom command line arguments for environments.
See this comment for the demonstration dataset specification. Then load in the demonstration dataset via the --traj-load-path
argument.
Requires Python 3.7 or higher. With conda:
- Clone the repo
conda create -n rlf python=3.7
source activate rlf
pip install -r requirements.txt
pip install -e .
Commit: 570d8c8d024cb86266610e72c5431ef17253c067
- PPO:
python -m rlf --cmd ppo/hopper --cd 0 --cfg ./tests/config.yaml --seed "31,41,51" --sess-id 0 --cuda False
Commit: 58644db1ac638ba6c8a22e7a01eacfedffd4a49f
- PPO:
python -m rlf --cmd ppo/halfcheetah --cd 0 --cfg ./tests/config.yaml --seed "31,41,51" --sess-id 0 --cuda False
Commit: 58644db1ac638ba6c8a22e7a01eacfedffd4a49f
- BCO:
python -m rlf --cmd bco/halfcheetah --cfg ./tests/config.yaml --seed "31,41,51" --sess-id 0 --cuda False
- GAIfO-s:
python -m rlf --cmd gaifo_s/halfcheetah --cfg ./tests/config.yaml --seed "31,41,51" --sess-id 0 --cuda False
- GAIfO:
python -m rlf --cmd gaifo/halfcheetah --cfg ./tests/config.yaml --seed "31,41,51" --sess-id 0 --cuda False
Commit: 5c051769088b6582b0b31db9a145738a9ed68565
- DDPG:
python -m rlf --cmd ddpg/pendulum --cd 0 --cfg ./tests/config.yaml --seed "31,41" --sess-id 0 --cuda False
Commit: 95bb3a7d0bf1945e414a0e77de8a749bd79dc554
- BitFlip:
python -m rlf --cmd her/bit_flip --cfg ./tests/config.yaml --cuda False --sess-id 0
- The SAC code is a clone of https://github.com/denisyarats/pytorch_sac.
The license is at
rlf/algos/off_policy/denis_yarats_LICENSE.md
- The PPO and rollout storage code is based on https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail.
- The environment preprocessing uses a stripped down version of https://github.com/openai/baselines.