ProtoRL is developed for students and academics that want to quickly reproduce algorithms found in research papers. It is designed to be used on a single machine with a multithreaded CPU and single GPU.
Out of the box, ProtoRL implements the following algorithms:
- DQN Double DQN, D3QN, PPO for single agents with a discrete action space
- DDPG, TD3, SAC, PPO for single agents with a continuous action space
- Prioritized Experience Replay for any off policy RL algorithm
- APE-X DDPG for distributed prioritized DDPG
- APE-X DQN for distributed prioritized D3QN
Note that this is a v0.3 release, and more agents are coming. I am working on developing open source versions of:
- Random Network Distillation (RND)
- Recurrent Replay Distributed DQN (R2D2)
- Never Give Up (NGU)
ProtoRL is built around Python3.8 and a minimal set of dependencies.
We utilize Gymnasium, PyTorch and Numpy. Support for the Atari environments comes from atari-py. Support for some of the control environments comes from the Box2D dependency.
It's recommended that you install ProtoRL to a virtual environment, due to the stringent nature of the dependencies.
python3.8 -m venv <your virtual env directory>
source <your virtual env directory>/bin/activate
git clone https://github.com/philtabor/protorl
pip install --upgrade pip
pip install .
python
import protorl as prl
If you don't receive any errors at this point, ProtoRL has installed correctly.
Next, you should check out the provided examples.
cd protorl/examples
mkdir models
python dqn.py
This will train a deep Q agent on the CartPole environment. If you want to try out other environments, please feel free to edit the dqn.py file.
If you are trying to implement an RL agent for a custom environment, this can be accomplished by passing the package_to_import flag to the make_env or make_vec_envs functions. Package names should be a string, and it should be an installed package.
If the environment you are trying to use doesn't adhere to the new Gymnasium specifications (i.e. it doesn't return observation, info from env.reset() or the step() function doesn't return both a terminal and truncated flag), then you will need to pass the apply_api_compatibility=True flag in the make_env or make_vec_envs functions.
Agents are comprised of a number of bolt-on modules that perform the basic functionality needed. There is the agent class, that encapsulates all the functionality we would associate with a deep reinforcement learning agent, network classes, memory classes, policy classes, episode loop classes, and wrapper classes for various environments.
The base agent class has functionality for choosing actions, updating target networks, interfacing with the memory, learning from its experience, and saving the models.
Algorithms, such as deep Q learning, deep deterministic policy gradients, etc. are implemented by deriving from this base agent class and implementing the update and choose_action functions.
The replay buffer is designed to be as generic as possible. Some defaults are provided, however the end user has complete control over what types of data are stored, and as what data types. Support for prioritized replay is provided out of the box, as well as a number of sampling modes (sample all memories, batch sampling, uniform random sampling, prioritized sampling).
The neural networks are also meant to be used as lego pieces for rapid implementation. The networks are generally divided into a base and a head, which are combined with a PyTorch sequential. The base handles interfacing with the environment, while the head supplies output to the agent's policy for action selection. ProtoRL implements a broad variety of bases and heads, ranging from convolutional networks to process screen images, to beta distributions for action selection. All of the built-in agents have functions that combine the appropriate base and head for easy use.
Policies are treated as separate modules. Each takes output from a neural network, and performs the appropriate action selection for the task at hand. Policies supported out of the box include:
- Epsilon-greedy (discrete actions)
- Categorical distribution (discrete actions)
- Noisy Deterministic (continuous actions)
- Gaussian distribution (continuous actions)
- Beta distribution (continuous actions)
The episode loop handles the interface between the agent and the environment. Included is a loop to handle single as well as multi-threaded agents, as well as to handle PPO. The inclusion of a PPO specific loop is due to the nature of data stored for replay in PPO.
Episode loops are built around the latest version of gym, where the step function returns 5 variables instead of 4. Attempting to use ProtoRL with earlier versions of gym will break due to this.
ProtoRL includes wrappers for stacking frames in environments that return screen images. Additional functionality required to reproduce results in the Atari library is provided. Since ProtoRL is multithreaded by default, we use a wrapper to handle passing actions to the step function for single threaded agents (this should probably change).
- Buried within the code is functionality for creating networks using a factory. This is more elegant than the current procedural implementation.
- Incorporate multi agent algorithms and environments.
- Implement an environment spec to handle environment contingent processing.
- Implement basic command line interface.
- Do away with having user make a model directory
v0.1: First stable release
v0.2: Migrate from Gym to Gymnasium
v0.3: Implement proper model checkpointing for resuming training