This package provides a framework of different derivative-free optimizers (powered by Tensorflow 2.0.0) which can be used in conjuction with an MPC (model predictive controller) and an analytical/ learned dynamics model to control an agent in a gym environment.
Derivative-Free Optimizer | BlackBox MPC |
---|---|
Cross-Entropy Method (CEM) | ✔️ |
Covariance Matrix Adaptation Evolutionary-Strategy (CMA-ES) | ✔️ |
Path Intergral Method (PI2) | ✔️ |
Particle Swarm Optimizer (PSO) | ✔️ |
Random Search (RandomSearch) | ✔️ |
Simultaneous Perturbation Stochastic Approximation (SPSA) | ✔️ |
The package features other functionalities to aid in model-based reinforcement learning (RL) research such as:
- Parallel implementation of the different optimizers using Tensorflow 2.0
- Loading/ saving system dynamics model.
- Monitoring progress using tensorboard.
- Learning dynamics functions.
- Recording videos.
- A modular and flexible interface design to enable research on different trajectory evaluation methods, optimizers, cost functions, system dynamics network architectures or even training algorithms.
Optimizers references:
pip install blackbox_mpc
git clone https://github.com/ossamaAhmed/blackbox_mpc.git
cd blackbox_mpc
pip install -e .
pip install tensorflow_gpu==2.0.0
The easiest way to get familiar with the framework is to run through the tutorials provided. An example is shown below:
from blackbox_mpc.policies.mpc_policy import \
MPCPolicy
from blackbox_mpc.utils.pendulum import PendulumTrueModel, \
pendulum_reward_function
import gym
env = gym.make("Pendulum-v0")
mpc_policy = MPCPolicy(reward_function=pendulum_reward_function,
env_action_space=env.action_space,
env_observation_space=env.observation_space,
true_model=True,
dynamics_function=PendulumTrueModel(),
optimizer_name='RandomSearch',
num_agents=1)
current_obs = env.reset()
for t in range(200):
action_to_execute, expected_obs, expected_reward = mpc_policy.act(
current_obs, t)
current_obs, reward, _, info = env.step(action_to_execute)
env.render()
An API specification and explanation of the code components can be found here.
blackbox_mpc is work done by Ossama Ahmed (ETH Zürich), Jonas Rothfuss (ETH Zürich) and Prof. Andreas Krause (ETH Zurich).
This package was developed at the Learning and Adaptive Systems Lab @ETH Zurich.
@misc{blackbox_mpc,
author = {Ahmed, Ossama and Rothfuss, Jonas and Krause, Andreas},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ossamaAhmed/blackbox_mpc}},
}
The code is licenced under the MIT license and free to use by anyone without any restrictions.
- Add bayesian neural networks (BNN) and graph neural networks (GNN) support.
- Add different trajectory evaluators to propagate uncertainities support.