Skip to content

Commit

Permalink
feat: update sheeprl (#27)
Browse files Browse the repository at this point in the history
  • Loading branch information
michele-milesi authored Feb 28, 2024
1 parent ab83f76 commit 48130ee
Showing 1 changed file with 20 additions and 20 deletions.
40 changes: 20 additions & 20 deletions content/handsOnReinforcementLearning/sheeprl/_index.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,20 +49,20 @@ Install DIAMBRA Arena with SheepRL interface:
pip install diambra-arena[sheeprl]
```

This should be enough to prepare your system to execute the following examples. You can refer to the official <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.1" target="_blank">SheepRL documentation</a> or reach out on our <a href="https://diambra.ai/discord" target="_blank">Discord server</a> for specific needs.
This should be enough to prepare your system to execute the following examples. You can refer to the official <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.4" target="_blank">SheepRL documentation</a> or reach out on our <a href="https://diambra.ai/discord" target="_blank">Discord server</a> for specific needs.

{{% notice warning %}}
Remember that to train agents, you must have installed the `diambra` CLI (`python3 -m pip install diambra`) and set the `DIAMBRAROMSPATH` environment variable properly.
{{% /notice %}}

All the examples presented below are available here: <a href="https://github.com/diambra/agents/tree/main/sheeprl" target="_blank">DIAMBRA Agents - SheepRL</a>. They have been created following the high level approach found on <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.1/howto/learn_in_diambra.md" target="_blank">SheepRL DIAMBRA</a> page, thus allowing to easily extend them and to understand how they interface with the different components.
All the examples presented below are available here: <a href="https://github.com/diambra/agents/tree/main/sheeprl" target="_blank">DIAMBRA Agents - SheepRL</a>. They have been created following the high level approach found on <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.4/howto/learn_in_diambra.md" target="_blank">SheepRL DIAMBRA</a> page, thus allowing to easily extend them and to understand how they interface with the different components.

These examples only aim at demonstrating the core functionalities and high-level aspects, they will not generate well-performing agents, even if the training time is extended to cover a large number of training steps. The user will need to build upon them, exploring aspects like policy network architecture, algorithm hyperparameter tuning, observation space tweaking, rewards wrapping, and other similar ones.

#### General Environment Settings
SheepRL provides a lot of different environments that share a set of parameters. Moreover, SheepRL leverages <a href="https://hydra.cc/" target="_blank">Hydra</a> for defining hierarchical configurations. Below is reported the general structure of the configuration of an environment and a table describing the arguments.

{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/configs/env/default.yaml" >}}
{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/configs/env/default.yaml" >}}

| <strong><span style="color:#5B5B60;">Argument</span></strong> | <strong><span style="color:#5B5B60;">Type</span></strong> | <strong><span style="color:#5B5B60;">Default Value(s)</span></strong> | <strong><span style="color:#5B5B60;">Description</span></strong> |
| ------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand All @@ -82,7 +82,7 @@ SheepRL provides a lot of different environments that share a set of parameters.


{{% notice note %}}
If you have never used Hydra, before continuing, it is strongly recommended to check the <a href="https://hydra.cc/" target="_blank">Hydra official documentation</a> and the <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.1/howto/configs.md" target="_blank">SheepRL-related section</a>.
If you have never used Hydra, before continuing, it is strongly recommended to check the <a href="https://hydra.cc/" target="_blank">Hydra official documentation</a> and the <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.4/howto/configs.md" target="_blank">SheepRL-related section</a>.
{{% /notice %}}


Expand Down Expand Up @@ -126,24 +126,24 @@ class DiambraWrapper(gym.Wrapper):
{{% /notice %}}

{{% notice note %}}
For the interface low-level details, users can review the correspondent source code <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.1/sheeprl/envs/diambra.py" target="_blank">here</a>.
For the interface low-level details, users can review the correspondent source code <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.4/sheeprl/envs/diambra.py" target="_blank">here</a>.
{{% /notice %}}


#### Agent Settings
SheepRL provides several SOTA algorithms, both model-free and model-based. <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.1/sheeprl/configs/algo" target="_blank">Here</a> you can find the default configurations for these agent. Of course, one can change algorithm-related hyper-parameters for customizing his/her experiments.
SheepRL provides several SOTA algorithms, both model-free and model-based. <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.4/sheeprl/configs/algo" target="_blank">Here</a> you can find the default configurations for these agent. Of course, one can change algorithm-related hyper-parameters for customizing his/her experiments.

### Basic
As anticipated before, SheepRL provides several default configurations for all its components, which are available and can be composed to set up an experiment. Otherwise, you can customize the ones you want: the two main ones to be defined for experiments are the agent and the environment.

Regarding the environment, there are some constraints that must be respected, for example, the dictionary observation spaces cannot be nested. For this reason, the DIAMBRA <a href="../../wrappers/#flatten-and-filter-observation" target="_blank">flattening wrapper</a> is always used. For more information about the constraints of the SheepRL library, check <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.1/howto/learn_in_diambra.md#args" target="_blank">here</a>.
Regarding the environment, there are some constraints that must be respected, for example, the dictionary observation spaces cannot be nested. For this reason, the DIAMBRA <a href="../../wrappers/#flatten-and-filter-observation" target="_blank">flattening wrapper</a> is always used. For more information about the constraints of the SheepRL library, check <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.4/howto/learn_in_diambra.md#args" target="_blank">here</a>.

Instead, regarding the agent, the only two constraints that are present concern the observation and action spaces that agents support. You can read the supported observation and action spaces in Table 1 and Table 2 of the <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.1" target="_blank">README</a> in the SheepRL GitHub repository, respectively.
Instead, regarding the agent, the only two constraints that are present concern the observation and action spaces that agents support. You can read the supported observation and action spaces in Table 1 and Table 2 of the <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.4" target="_blank">README</a> in the SheepRL GitHub repository, respectively.


#### Customising the Configurations
The default configurations are available <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.1/sheeprl/configs" target="_blank">here</a>. If you want to define your custom experiments, you just need to follow a few steps:
1. You need to create a folder (with the same structure as the <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.1/sheeprl/configs" target="_blank">SheepRL configs folder</a>) where to place your custom configurations.
The default configurations are available <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.4/sheeprl/configs" target="_blank">here</a>. If you want to define your custom experiments, you just need to follow a few steps:
1. You need to create a folder (with the same structure as the <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.4/sheeprl/configs" target="_blank">SheepRL configs folder</a>) where to place your custom configurations.
2. You need to define the `SHEEPRL_SEARCH_PATH` environment variable in the `.env` file as follows: `SHEEPRL_SEARCH_PATH=file://relative/path/to/custom/configs/folder;pkg://sheeprl.configs`.
3. You need to define the custom configurations, being careful that the filename is different from the default ones. If this is not respected, your file will overwrite the default configurations.

Expand All @@ -158,7 +158,7 @@ This example demonstrates how to:
* Run the trained agent in the environment for one episode.


SheepRL natively supports dictionary observation spaces, the only thing you need to define is the keys of the observations you want to process. For more information about observations selection, check <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.1/howto/select_observations.md" target="_blank">here</a>.
SheepRL natively supports dictionary observation spaces, the only thing you need to define is the keys of the observations you want to process. For more information about observations selection, check <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.4/howto/select_observations.md" target="_blank">here</a>.

##### Configs Folder
First, it is necessary to create a folder for the configuration files. We create the `configs` folder under the `./sheeprl/` folder in the <a href="https://github.com/diambra/agents/tree/main" target="_blank">DIAMBRA Arena</a> GitHub repository. Then we added the `.env` file in `./sheeprl/` folder, in which we need to define the `SHEEPRL_SEARCH_PATH` environment variable as follows:
Expand All @@ -172,7 +172,7 @@ Below is reported a possible configuration of the environment.
{{< github_code "https://raw.githubusercontent.com/diambra/agents/main/sheeprl/configs/env/custom_env.yaml" >}}

##### Define the Agent
As for the environment, we need to create a dedicated folder to place the custom configurations of the agents: we create the `algo` folder in the `./sheeprl/configs` folder and we place the `custom_ppo_agent.yaml` file. Under the `default` keyword, it is possible to retrieve the configurations specified in another file, in our case, since we are defining the agent, we can take the configuration from the <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.1/sheeprl/configs/algo" target="_blank">algorithm config folder</a> in SheepRL, in which several SOTA agents are defined.
As for the environment, we need to create a dedicated folder to place the custom configurations of the agents: we create the `algo` folder in the `./sheeprl/configs` folder and we place the `custom_ppo_agent.yaml` file. Under the `default` keyword, it is possible to retrieve the configurations specified in another file, in our case, since we are defining the agent, we can take the configuration from the <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.4/sheeprl/configs/algo" target="_blank">algorithm config folder</a> in SheepRL, in which several SOTA agents are defined.

{{% notice note %}}
When defining an agent it is mandatory to define the `name` of the algorithm (it must be equal to the filename of the file in which the algorithm is defined). The value of these parameters defines which algorithm will be used for training. If you inherit the default configurations of a specific algorithm, then you do not need to define it, since it is already defined in the default configs of that algorithm.
Expand All @@ -183,7 +183,7 @@ Below is reported a configuration file for a PPO agent.

##### Define the Experiment
The last thing to do is to define the experiment. You just need to define a `custom_exp.yaml` file in the `./sheeprl/configs/exp` folder and assemble the environment, the agent, and the other components of the SheepRL framework. In particular, there are four parameters that must be defined:
1. `algo.total_steps`: the total number of policy steps to compute during training (for more information, check <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.1/howto/work_with_steps.md#policy-steps" target="_blank">here</a>).
1. `algo.total_steps`: the total number of policy steps to compute during training (for more information, check <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.4/howto/work_with_steps.md#policy-steps" target="_blank">here</a>).
2. `buffer.size`: the dimension of the replay buffer.
3. `algo.cnn_keys`: the keys of frames in observations that must be encoded (and eventually reconstructed by the decoder).
4. `algo.mlp_keys`: the keys of vectors in observations that must be encoded (and eventually reconstructed by the decoder).
Expand All @@ -196,7 +196,7 @@ Below is an example of an experiment config file.
{{< github_code "https://raw.githubusercontent.com/diambra/agents/main/sheeprl/configs/exp/custom_exp.yaml" >}}

{{% notice note %}}
When defining the configurations of the experiment you can specify how frequently save checkpoints of the model, and if you want to save the final agent. For more information, check <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.1/howto/logs_and_checkpoints.md#checkpointing" target="_blank">here</a>.
When defining the configurations of the experiment you can specify how frequently save checkpoints of the model, and if you want to save the final agent. For more information, check <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.4/howto/logs_and_checkpoints.md#checkpointing" target="_blank">here</a>.
{{% /notice %}}


Expand All @@ -218,7 +218,7 @@ After training, you can decide to evaluate the agent as many times as you want.
The reason why only these three parameters need to be specified is to avoid inconsistencies, e.g. the checkpoint of one agent and the configurations of the evaluation refer to another one, or the model in the checkpoint has different dimensions from the model specified in the configurations.
This implies, however, that the evaluation script expects a certain directory structure. For this reason, the structure of the log directory should not be changed: all of it can be moved, but not the checkpoint individually, otherwise the script cannot automatically retrieve the environment and agent configurations.

{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/configs/eval_config.yaml" >}}
{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/configs/eval_config.yaml" >}}

To evaluate the agent you just need to run the following command:
```shell
Expand Down Expand Up @@ -249,11 +249,11 @@ The `evaluate.py` script:


##### PPO Implementation
In this paragraph, we quote the code of our ppo implementation (the `ppo.py` file in the <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.1/sheeprl/algos/ppo" target="_blank">SheepRL PPO folder</a>), just to give more context on how SheepRL works. In the `main()` function, all the components needed for training are instantiated (i.e., the agent, the environments, the buffer, the logger, and so on). Then, the environment interaction is performed, and after collecting the rollout steps, the train function is called.
In this paragraph, we quote the code of our ppo implementation (the `ppo.py` file in the <a href="https://github.com/Eclectic-Sheep/sheeprl/tree/v0.5.4/sheeprl/algos/ppo" target="_blank">SheepRL PPO folder</a>), just to give more context on how SheepRL works. In the `main()` function, all the components needed for training are instantiated (i.e., the agent, the environments, the buffer, the logger, and so on). Then, the environment interaction is performed, and after collecting the rollout steps, the train function is called.

The `train()` function is responsible for sharing the data between processes, if more processes are launched and the `buffer.share_data` is set to `True`. Then, for each batch, the losses are computed and the agent is updated.

{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/algos/ppo/ppo.py" >}}
{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/algos/ppo/ppo.py" >}}

#### Parallel Environments
In addition to what is seen in previous examples, this one demonstrates how to run training using parallel environments. In this example, the same PPO algorithm is used as before.
Expand Down Expand Up @@ -282,7 +282,7 @@ diambra run -s=6 python train.py exp=custom_parallel_env_exp
SheepRL allows training to be distributed thanks to <a href="https://lightning.ai/docs/fabric/stable/" target="_blank">Lightning Fabric</a>.

The default Fabric configuration is the following:
{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/configs/fabric/default.yaml" >}}
{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/configs/fabric/default.yaml" >}}

{{% notice note %}}
The `sheeprl.utils.callback.CheckpointCallback` is used for saving the checkpoint during training and for saving the trained agent.
Expand All @@ -309,12 +309,12 @@ To run the fabric experiment, make sure you have a `cuda` GPU in your device, ot
Finally, SheepRL allows you to visualize and monitor training using Tensorboard.

{{% notice note %}}
We strongly recommend to read the SheepRL <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.1/howto/logs_and_checkpoints.md" target="_blank">logging documentation</a> to know about how to enable/disable logging.
We strongly recommend to read the SheepRL <a href="https://github.com/Eclectic-Sheep/sheeprl/blob/v0.5.4/howto/logs_and_checkpoints.md" target="_blank">logging documentation</a> to know about how to enable/disable logging.
{{% /notice %}}

Below is reported the default logging configuration and a table describing the arguments.

{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/configs/metric/default.yaml" >}}
{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/configs/metric/default.yaml" >}}

| <strong><span style="color:#5B5B60;">Argument</span></strong> | <strong><span style="color:#5B5B60;">Type</span></strong> | <strong><span style="color:#5B5B60;">Default Value(s)</span></strong> | <strong><span style="color:#5B5B60;">Description</span></strong> |
| ------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand Down

0 comments on commit 48130ee

Please sign in to comment.