The format is based on Keep a Changelog.
- Distributed multi-GPU and multi-node learning (JAX implementation)
- Utilities to start multiple processes from a single program invocation for distributed learning using JAX
- Model instantiators
return_source
parameter to get the source class definition used to instantiate the models Runner
utility to run training/evaluation workflows in a few lines of code- Wrapper for Isaac Lab multi-agent environments
- Wrapper for Google Brax environments
- Move the KL reduction from the PyTorch
KLAdaptiveLR
class to each agent that uses it in distributed runs - Move the PyTorch distributed initialization from the agent base class to the ML framework configuration
- Upgrade model instantiator implementations to support CNN layers and complex network definitions, and implement them using dynamic execution of Python code
- Update Isaac Lab environment loader argument parser options to match Isaac Lab version
- Allow to store tensors/arrays with their original dimensions in memory and make it the default option
- Decouple the observation and state spaces in single and multi-agent environment wrappers and add the
state
method to get the state of the environment - Simplify multi-agent environment wrapper API by removing shared space properties and methods
- Catch TensorBoard summary iterator exceptions in
TensorboardFileIterator
postprocessing utils - Fix automatic wrapper detection issue (introduced in previous version) for Isaac Gym (previews), DeepMind and vectorized Gymnasium environments
- Fix vectorized/parallel environments
reset
method return values when called more than once - Fix IPPO and MAPPO
act
method return values when JAX-NumPy backend is enabled
- Define the
environment_info
trainer config to log environment info (PyTorch implementation) - Add support to automatically compute the write and checkpoint intervals and make it the default option
- Single forward-pass in shared models
- Distributed multi-GPU and multi-node learning (PyTorch implementation)
- Update Orbit-related source code and docs to Isaac Lab
- Move the batch sampling inside gradient step loop for DDPG and TD3
- Perform JAX computation on the selected device
- MultiCategorical mixin to operate MultiDiscrete action spaces
- Rename the
ManualTrainer
toStepTrainer
- Output training/evaluation progress messages to system's stdout
- Get single observation/action spaces for vectorized environments
- Update Isaac Orbit environment wrapper
Transition from pre-release versions (1.0.0-rc.1
and1.0.0-rc.2
) to a stable version.
This release also announces the publication of the skrl paper in the Journal of Machine Learning Research (JMLR): https://www.jmlr.org/papers/v24/23-0112.html
Summary of the most relevant features:
- JAX support
- New documentation theme and structure
- Multi-agent Reinforcement Learning (MARL)
- Get truncation from
time_outs
info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments - Time-limit (truncation) boostrapping in on-policy actor-critic agents
- Model instantiators
initial_log_std
parameter to set the log standard deviation's initial value
- Structure environment loaders and wrappers file hierarchy coherently.
Import statements now follow the next convention:
- Wrappers (e.g.):
from skrl.envs.wrappers.torch import wrap_env
from skrl.envs.wrappers.jax import wrap_env
- Loaders (e.g.):
from skrl.envs.loaders.torch import load_omniverse_isaacgym_env
from skrl.envs.loaders.jax import load_omniverse_isaacgym_env
- Wrappers (e.g.):
- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
- JAX support (with Flax and Optax)
- RPO agent
- IPPO and MAPPO multi-agent
- Multi-agent base class
- Bi-DexHands environment loader
- Wrapper for Bi-DexHands environments
- Wrapper for PettingZoo environments
- Parameters
num_envs
,headless
andcli_args
for configuring Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments when they are loaded
- Migrate to
pyproject.toml
Python package development - Define ML framework dependencies as optional dependencies in the library installer
- Move agent implementations with recurrent models to a separate file
- Allow closing the environment at the end of execution instead of after training/evaluation
- Documentation theme from sphinx_rtd_theme to furo
- Update documentation structure and examples
- Compatibility for Isaac Sim or OmniIsaacGymEnvs (2022.2.0 or earlier)
- Disable PyTorch gradient computation during the environment stepping
- Get categorical models' entropy
- Typo in
KLAdaptiveLR
learning rate scheduler (Keep the old name for compatibility with the examples of previous versions. The old name will be removed in future releases)
- Update loader and utils for OmniIsaacGymEnvs 2022.2.1.0
- Update Omniverse Isaac Gym real-world examples
- TensorBoard writer instantiation when
write_interval
is zero
- Isaac Orbit environment loader
- Wrap an Isaac Orbit environment
- Gaussian-Deterministic shared model instantiator
- Utility for downloading models from Hugging Face Hub
- Initialization of agent components if they have not been defined
- Manual trainer
train
/eval
method default arguments
- Support for Farama Gymnasium interface
- Wrapper for robosuite environments
- Weights & Biases integration
- Set the running mode (training or evaluation) of the agents
- Allow clipping the gradient norm for DDPG, TD3 and SAC agents
- Initialize model biases
- Add RNN (RNN, LSTM, GRU and any other variant) support for A2C, DDPG, PPO, SAC, TD3 and TRPO agents
- Allow disabling training/evaluation progressbar
- Farama Shimmy and robosuite examples
- KUKA LBR iiwa real-world example
- Forward model inputs as a Python dictionary
- Returns a Python dictionary with extra output values in model calls
- Adopt the implementation of
terminated
andtruncated
overdone
for all environments
- Omniverse Isaac Gym simulation speed for the Franka Emika real-world example
- Call agents' method
record_transition
instead of parent method to allow storing samples in memories during evaluation - Move TRPO policy optimization out of the value optimization loop
- Access to the categorical model distribution
- Call reset only once for Gym/Gymnasium vectorized environments
- Deprecated method
start
in trainers
- AMP agent for physics-based character animation
- Manual trainer
- Gaussian model mixin
- Support for creating shared models
- Parameter
role
to model methods - Wrapper compatibility with the new OpenAI Gym environment API
- Internal library colored logger
- Migrate checkpoints/models from other RL libraries to skrl models/agents
- Configuration parameter
store_separately
to agent configuration dict - Save/load agent modules (models, optimizers, preprocessors)
- Set random seed and configure deterministic behavior for reproducibility
- Benchmark results for Isaac Gym and Omniverse Isaac Gym on the GitHub discussion page
- Franka Emika real-world example
- Models implementation as Python mixin
- Multivariate Gaussian model (
GaussianModel
until 0.7.0) toMultivariateGaussianMixin
- Trainer's
cfg
parameter position and default values - Show training/evaluation display progress using
tqdm
- Update Isaac Gym and Omniverse Isaac Gym examples
- Missing recursive arguments during model weights initialization
- Tensor dimension when computing preprocessor parallel variance
- Models' clip tensors dtype to
float32
- Parameter
inference
from model methods - Configuration parameter
checkpoint_policy_only
from agent configuration dict
- A2C agent
- Isaac Gym (preview 4) environment loader
- Wrap an Isaac Gym (preview 4) environment
- Support for OpenAI Gym vectorized environments
- Running standard scaler for input preprocessing
- Installation from PyPI (
pip install skrl
)
- Omniverse Isaac Gym environment loader
- Wrap an Omniverse Isaac Gym environment
- Save best models during training
- TRPO agent
- Wrapper for DeepMind environments
- KL Adaptive learning rate scheduler
- Handle
gym.spaces.Dict
observation spaces (OpenAI Gym and DeepMind environments) - Forward environment info to agent
record_transition
method - Expose and document the random seeding mechanism
- Define rewards shaping function in agents' config
- Define learning rate scheduler in agents' config
- Improve agent's algorithm description in documentation (PPO and TRPO at the moment)
- Compute the Generalized Advantage Estimation (GAE) in agent
_update
method - Move noises definition to
resources
folder - Update the Isaac Gym examples
compute_functions
for computing the GAE from memory base class
- Examples of all Isaac Gym environments (preview 3)
- TensorBoard file iterator for data post-processing
- Init and evaluate agents in ParallelTrainer
- CEM, SARSA and Q-learning agents
- Tabular model
- Parallel training using multiprocessing
- Isaac Gym utilities
- Initialize agents in a separate method
- Change the name of the
networks
argument tomodels
- Reset environments after post-processing
- DQN and DDQN agents
- Export memory to files
- Postprocessing utility to iterate over memory files
- Model instantiator utility to allow fast development
- More examples and contents in the documentation
- Clip actions using the whole space's limits
- First official release