PPO agent trained to playLunarLanderContinuous-v2. Reward per episode at this point was ~230.
flare
is a small reinforcement learning library. Currently, the use
case for this library is small-scale RL experimentation/research. Much
of the code is refactored from and built off of
SpinningUp, so massive
thanks to them for writing quality, understandable, and performant code.
(old) Blog post about this repository here.
Flare supports parallelization via MPI! So, you’ll need to install OpenMPI to run this code. SpinningUp provides the following installation instructions:
sudo apt-get update && sudo apt-get install libopenmpi-dev
brew install openmpi
If you’re on Windows, here is a link to some instructions.
If the Mac instructions don’t work for you, consider these instructions.
It is recommended to use a virtual env before installing this, to avoid conflicting with other installed packages. Anaconda and Python offer virtual environment systems.
Finally, clone the repository, cd into it:
git clone https://github.com/jfpettit/flare.git cd flare pip install -e .
Presently, A2C and PPO are implemented and working. Run from the command line with:
python -m flare.run
This will run PPO on
LunarLander-v2 with
default arguments. If you want to change the algorithm to A2C, run on a
different env, or otherwise change some defaults with this command line
interface, then do python -m flare.run -h
to see the available
optional arguments.
Import required packages:
import gym
from flare.polgrad import A2C
env = gym.make('CartPole-v0') # or other gym env
agent = A2C(env)
rew, leng = agent.learn(100)
The above snippet will train an agent on the CartPole environment for 100 epochs.
You may alter the architecture of your actor-critic network by passing in a tuple of hidden layer sizes to your agent initialization. i.e.:
from flare.polgrad import PPO
agent = PPO(env, hidden_sizes=(64, 32))
rew, leng = agent.learn(100)
For a more detailed example using PPO, see the example file at: examples/ppo_example.py.
This repository is intended to be a lightweight and simple to use RL framework, while still getting good performance.
Algorithms will be listed here as they are implemented:
- Advantage Actor Critic (A2C)
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradients (DDPG)
- Twin Delayed Deep Deterministic Policy Gradients (TD3)
- Soft Actor Critic (SAC)
The policy gradient algorithms (A2C, PPO), support running on multiple CPUs via MPI. The Q Policy Gradient algorithms (SAC, DDPG, TD3) do not yet support MPI parallelization.
If you wish to build your own actor-critic from scratch, then it is recommended to use the FireActorCritic as a template.
Flare now automatically logs run metrics to
TensorBoard. View these by
running tensorboard --logdir flare_runs
in a terminal.
We’d love for you to contribute! Any help is welcome. See CONTRIBUTING.md for contributor guidelines and info.
- Comment code to make it clearer
- Test algorithm performance