Skip to content

Commit

Permalink
update rl_zoo examples
Browse files Browse the repository at this point in the history
  • Loading branch information
perezjln committed Jun 26, 2024
1 parent 3567dde commit 9455733
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 0 deletions.
40 changes: 40 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,46 @@ export MUJOCO_GL=osmesa
export DISPLAY=:0
```

## Training Policies with Stable Baselines3 and RL Zoo3 - step-by-step guide

To train a reinforcement learning policy using Stable Baselines3 and RL Zoo3, you need to define a configuration file and then launch the training process.

### Step 1: Define a Configuration File

Create a YAML configuration file specifying the training parameters for your environment. Below is an example configuration for the `ReachCube-v0` environment:

```yaml
ReachCube-v0:
n_timesteps: !!float 1e7
policy: 'MultiInputPolicy'
frame_stack: 3
use_sde: True
```
- `n_timesteps`: The number of timesteps to train the model. Here, it is set to 10 million.
- `policy`: The policy type to be used. In this case, it is set to `'MultiInputPolicy'`.
- `frame_stack`: The number of frames to stack, which is 3 in this example.
- `use_sde`: A boolean indicating whether to use State-Dependent Exploration (SDE). It is set to `True`.

### Step 2: Launch the Training Process

After defining the configuration file, you can start the training of your policy using the following command:

```sh
python -u -m rl_zoo3.train --algo tqc --env ReachCube-v0 --gym-packages gym_lowcostrobot --conf rl_zoo3_conf.yaml --env-kwargs observation_mode:'"both"' -orga <huggingface_user> -f logs
```

- `python -u -m rl_zoo3.train`: Executes the training module from RL Zoo3.
- `--algo tqc`: Specifies the algorithm to use, in this case, TQC (Truncated Quantile Critics).
- `--env ReachCube-v0`: Specifies the environment to train on.
- `--gym-packages gym_lowcostrobot`: Includes the necessary gym packages for your environment.
- `--conf rl_zoo3_conf.yaml`: Points to the configuration file you created.
- `--env-kwargs observation_mode:'"both"'`: Passes additional environment-specific arguments.
- `-orga <huggingface_user>`: Specifies the Hugging Face organization/user where the model will be stored.
- `-f logs`: Specifies the directory where the training logs will be saved.

For more detailed information on the available options and configurations, refer to the RL Zoo3 documentation.

## Contributing

We welcome contributions to the project! Please follow these general guidelines:
Expand Down
1 change: 1 addition & 0 deletions examples/rl_zoo3.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
python -u -m rl_zoo3.train --algo tqc --env ReachCube-v0 --gym-packages gym_lowcostrobot -conf rl_zoo3_conf.yaml --env-kwargs observation_mode:'"state"'
28 changes: 28 additions & 0 deletions examples/rl_zoo3_conf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# conf.yml
n_envs: 2

# Hyperparameters specific to the training algorithm
tqc:
policy: "MultiInputPolicy"
learning_rate: 3e-4
buffer_size: 10000 # Reduce buffer size
learning_starts: 10000
batch_size: 256
tau: 0.005
gamma: 0.99
train_freq: 1
gradient_steps: 1
action_noise: null
optimize_memory_usage: True
policy_kwargs: {}
verbose: 1
seed: null
device: "auto"


ReachCube-v0:
n_timesteps: !!float 1e7
policy: 'MultiInputPolicy'
frame_stack: 1 # Disable frame stacking if not needed
use_sde: True
observation_mode: state

0 comments on commit 9455733

Please sign in to comment.