Status: Under development (expect bug fixes and huge updates)
ShinRL is an open-source JAX library specialized for the evaluation of reinforcement learning (RL) algorithms from both theoretical and practical perspectives. Please take a look at the paper for details. Try ShinRL at experiments/QuickStart.ipynb.
import gym
from shinrl import DiscreteViSolver
import matplotlib.pyplot as plt
# make an env & a config
env = gym.make("ShinPendulum-v0")
config = DiscreteViSolver.DefaultConfig(explore="eps_greedy", approx="nn", steps_per_epoch=10000)
# make & run a solver
mixins = DiscreteViSolver.make_mixins(env, config)
dqn_solver = DiscreteViSolver.factory(env, config, mixins)
dqn_solver.run()
# plot performance
returns = dqn_solver.scalars["Return"]
plt.plot(returns["x"], returns["y"])
# plot learned q-values (action == 0)
q0 = dqn_solver.data["Q"][:, 0]
env.plot_S(q0, title="Learned")
ShinEnv
provides small environments with oracle methods that can compute exact quantities.- Some environments support continuous action space and image observation:
- See the tutorial for details: experiments/Tutorials/ShinEnvTutorial.ipynb.
Environment | Discrete action | Continuous action | Image Observation | Tuple Observation |
---|---|---|---|---|
ShinMaze | ✔️ | ❌ | ❌ | ✔️ |
ShinMountainCar-v0 | ✔️ | ✔️ | ✔️ | ✔️ |
ShinPendulum-v0 | ✔️ | ✔️ | ✔️ | ✔️ |
ShinCartPole-v0 | ✔️ | ✔️ | ❌ | ✔️ |
- A
Solver
solves an environment with specified algorithms. - A "mixin" is a class which defines and implements a single feature. ShinRL's solvers are instantiated by mixing some mixins.
- See the tutorial for details: experiments/Tutorials/SolverTutorial.ipynb.
- The table bellow lists the implemented popular algorithms.
- Note that it does not list all the implemented algorithms (e.g., DDP 1 version of the DQN algorithm). See
make_mixin
functions of solvers for implemented variants. - Note that the implemented algorithms may differ from the original implementation for simplicity (e.g., Discrete SAC). See source code of solvers for details.
Algorithm | Solver | Configuration | Type 1 |
---|---|---|---|
Value Iteration (VI) | DiscreteViSolver | approx == "tabular" & explore == "oracle" |
TDP |
Policy Iteration (PI) | DiscretePiSolver | approx == "tabular" & explore == "oracle" |
TDP |
Conservative Value Iteration (CVI) | DiscreteViSolver | approx == "tabular" & explore == "oracle & er_coef != 0 & kl_coef != 0" |
TDP |
Tabular Q Learning | DiscreteViSolver | approx == "tabular" & explore != "oracle" |
TRL |
SARSA | DiscretePiSolver | approx == "tabular" & explore != "oracle" & eps_decay_target_pol > 0 |
TRL |
Deep Q Network (DQN) | DiscreteViSolver | approx == "nn" & explore != "oracle" |
DRL |
Soft DQN | DiscreteViSolver | approx == "nn" & explore != "oracle" & er_coef != 0 |
DRL |
Munchausen-DQN | DiscreteViSolver | approx == "nn" & explore != "oracle" & er_coef != 0 & kl_coef != 0 |
DRL |
Double-DQN | DiscreteViSolver | approx == "nn" & explore != "oracle" & use_double_q == True |
DRL |
Discrete Soft Actor Critic | DiscretePiSolver | approx == "nn" & explore != "oracle" & er_coef != 0 |
DRL |
Deep Deterministic Policy Gradient (DDPG) | ContinuousDdpgSolver | approx == "nn" & explore != "oracle" |
DRL |
1 Algorithm Type:
- TDP (
approx=="tabular" & explore=="oracle"
): Tabular Dynamic Programming algorithms. No exploration & no approximation & the complete specification about the MDP is given. - TRL (
approx=="tabular" & explore!="oracle"
): Tabular Reinforcement Learning algorithms. No approximation & the dynamics and the reward functions are unknown. - DDP (
approx=="nn" & explore=="oracle"
): Deep Dynamic Programming algorithms. It is the same as TDP, except that neural networks approximate computed values. - DRL (
approx=="nn" & explore!="oracle"
): Deep Reinforcement Learning algorithms. It is the same as TRL, except that neural networks approximate computed values.
git clone git@github.com:omron-sinicx/ShinRL.git
cd ShinRL
pip install -e .
cd ShinRL
make test
cd ShinRL
make format
cd ShinRL
docker-compose up
# Neurips DRL WS 2021 version (pytorch branch)
@inproceedings{toshinori2021shinrl,
author = {Kitamura, Toshinori and Yonetani, Ryo},
title = {ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives},
year = {2021},
booktitle = {Proceedings of the NeurIPS Deep RL Workshop},
}
# Arxiv version (commit 2d3da)
@article{toshinori2021shinrlArxiv,
author = {Kitamura, Toshinori and Yonetani, Ryo},
title = {ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives},
year = {2021},
url = {https://arxiv.org/abs/2112.04123},
journal={arXiv preprint arXiv:2112.04123},
}