diff --git a/content/handsOnReinforcementLearning/sheeprl/_index.en.md b/content/handsOnReinforcementLearning/sheeprl/_index.en.md index cece1c2f..d980e28f 100644 --- a/content/handsOnReinforcementLearning/sheeprl/_index.en.md +++ b/content/handsOnReinforcementLearning/sheeprl/_index.en.md @@ -49,20 +49,20 @@ Install DIAMBRA Arena with SheepRL interface: pip install diambra-arena[sheeprl] ``` -This should be enough to prepare your system to execute the following examples. You can refer to the official SheepRL documentation or reach out on our Discord server for specific needs. +This should be enough to prepare your system to execute the following examples. You can refer to the official SheepRL documentation or reach out on our Discord server for specific needs. {{% notice warning %}} Remember that to train agents, you must have installed the `diambra` CLI (`python3 -m pip install diambra`) and set the `DIAMBRAROMSPATH` environment variable properly. {{% /notice %}} -All the examples presented below are available here: DIAMBRA Agents - SheepRL. They have been created following the high level approach found on SheepRL DIAMBRA page, thus allowing to easily extend them and to understand how they interface with the different components. +All the examples presented below are available here: DIAMBRA Agents - SheepRL. They have been created following the high level approach found on SheepRL DIAMBRA page, thus allowing to easily extend them and to understand how they interface with the different components. These examples only aim at demonstrating the core functionalities and high-level aspects, they will not generate well-performing agents, even if the training time is extended to cover a large number of training steps. The user will need to build upon them, exploring aspects like policy network architecture, algorithm hyperparameter tuning, observation space tweaking, rewards wrapping, and other similar ones. #### General Environment Settings SheepRL provides a lot of different environments that share a set of parameters. Moreover, SheepRL leverages Hydra for defining hierarchical configurations. Below is reported the general structure of the configuration of an environment and a table describing the arguments. -{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/configs/env/default.yaml" >}} +{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/configs/env/default.yaml" >}} | Argument | Type | Default Value(s) | Description | | ------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -82,7 +82,7 @@ SheepRL provides a lot of different environments that share a set of parameters. {{% notice note %}} -If you have never used Hydra, before continuing, it is strongly recommended to check the Hydra official documentation and the SheepRL-related section. +If you have never used Hydra, before continuing, it is strongly recommended to check the Hydra official documentation and the SheepRL-related section. {{% /notice %}} @@ -126,24 +126,24 @@ class DiambraWrapper(gym.Wrapper): {{% /notice %}} {{% notice note %}} -For the interface low-level details, users can review the correspondent source code here. +For the interface low-level details, users can review the correspondent source code here. {{% /notice %}} #### Agent Settings -SheepRL provides several SOTA algorithms, both model-free and model-based. Here you can find the default configurations for these agent. Of course, one can change algorithm-related hyper-parameters for customizing his/her experiments. +SheepRL provides several SOTA algorithms, both model-free and model-based. Here you can find the default configurations for these agent. Of course, one can change algorithm-related hyper-parameters for customizing his/her experiments. ### Basic As anticipated before, SheepRL provides several default configurations for all its components, which are available and can be composed to set up an experiment. Otherwise, you can customize the ones you want: the two main ones to be defined for experiments are the agent and the environment. -Regarding the environment, there are some constraints that must be respected, for example, the dictionary observation spaces cannot be nested. For this reason, the DIAMBRA flattening wrapper is always used. For more information about the constraints of the SheepRL library, check here. +Regarding the environment, there are some constraints that must be respected, for example, the dictionary observation spaces cannot be nested. For this reason, the DIAMBRA flattening wrapper is always used. For more information about the constraints of the SheepRL library, check here. -Instead, regarding the agent, the only two constraints that are present concern the observation and action spaces that agents support. You can read the supported observation and action spaces in Table 1 and Table 2 of the README in the SheepRL GitHub repository, respectively. +Instead, regarding the agent, the only two constraints that are present concern the observation and action spaces that agents support. You can read the supported observation and action spaces in Table 1 and Table 2 of the README in the SheepRL GitHub repository, respectively. #### Customising the Configurations -The default configurations are available here. If you want to define your custom experiments, you just need to follow a few steps: -1. You need to create a folder (with the same structure as the SheepRL configs folder) where to place your custom configurations. +The default configurations are available here. If you want to define your custom experiments, you just need to follow a few steps: +1. You need to create a folder (with the same structure as the SheepRL configs folder) where to place your custom configurations. 2. You need to define the `SHEEPRL_SEARCH_PATH` environment variable in the `.env` file as follows: `SHEEPRL_SEARCH_PATH=file://relative/path/to/custom/configs/folder;pkg://sheeprl.configs`. 3. You need to define the custom configurations, being careful that the filename is different from the default ones. If this is not respected, your file will overwrite the default configurations. @@ -158,7 +158,7 @@ This example demonstrates how to: * Run the trained agent in the environment for one episode. -SheepRL natively supports dictionary observation spaces, the only thing you need to define is the keys of the observations you want to process. For more information about observations selection, check here. +SheepRL natively supports dictionary observation spaces, the only thing you need to define is the keys of the observations you want to process. For more information about observations selection, check here. ##### Configs Folder First, it is necessary to create a folder for the configuration files. We create the `configs` folder under the `./sheeprl/` folder in the DIAMBRA Arena GitHub repository. Then we added the `.env` file in `./sheeprl/` folder, in which we need to define the `SHEEPRL_SEARCH_PATH` environment variable as follows: @@ -172,7 +172,7 @@ Below is reported a possible configuration of the environment. {{< github_code "https://raw.githubusercontent.com/diambra/agents/main/sheeprl/configs/env/custom_env.yaml" >}} ##### Define the Agent -As for the environment, we need to create a dedicated folder to place the custom configurations of the agents: we create the `algo` folder in the `./sheeprl/configs` folder and we place the `custom_ppo_agent.yaml` file. Under the `default` keyword, it is possible to retrieve the configurations specified in another file, in our case, since we are defining the agent, we can take the configuration from the algorithm config folder in SheepRL, in which several SOTA agents are defined. +As for the environment, we need to create a dedicated folder to place the custom configurations of the agents: we create the `algo` folder in the `./sheeprl/configs` folder and we place the `custom_ppo_agent.yaml` file. Under the `default` keyword, it is possible to retrieve the configurations specified in another file, in our case, since we are defining the agent, we can take the configuration from the algorithm config folder in SheepRL, in which several SOTA agents are defined. {{% notice note %}} When defining an agent it is mandatory to define the `name` of the algorithm (it must be equal to the filename of the file in which the algorithm is defined). The value of these parameters defines which algorithm will be used for training. If you inherit the default configurations of a specific algorithm, then you do not need to define it, since it is already defined in the default configs of that algorithm. @@ -183,7 +183,7 @@ Below is reported a configuration file for a PPO agent. ##### Define the Experiment The last thing to do is to define the experiment. You just need to define a `custom_exp.yaml` file in the `./sheeprl/configs/exp` folder and assemble the environment, the agent, and the other components of the SheepRL framework. In particular, there are four parameters that must be defined: -1. `algo.total_steps`: the total number of policy steps to compute during training (for more information, check here). +1. `algo.total_steps`: the total number of policy steps to compute during training (for more information, check here). 2. `buffer.size`: the dimension of the replay buffer. 3. `algo.cnn_keys`: the keys of frames in observations that must be encoded (and eventually reconstructed by the decoder). 4. `algo.mlp_keys`: the keys of vectors in observations that must be encoded (and eventually reconstructed by the decoder). @@ -196,7 +196,7 @@ Below is an example of an experiment config file. {{< github_code "https://raw.githubusercontent.com/diambra/agents/main/sheeprl/configs/exp/custom_exp.yaml" >}} {{% notice note %}} -When defining the configurations of the experiment you can specify how frequently save checkpoints of the model, and if you want to save the final agent. For more information, check here. +When defining the configurations of the experiment you can specify how frequently save checkpoints of the model, and if you want to save the final agent. For more information, check here. {{% /notice %}} @@ -218,7 +218,7 @@ After training, you can decide to evaluate the agent as many times as you want. The reason why only these three parameters need to be specified is to avoid inconsistencies, e.g. the checkpoint of one agent and the configurations of the evaluation refer to another one, or the model in the checkpoint has different dimensions from the model specified in the configurations. This implies, however, that the evaluation script expects a certain directory structure. For this reason, the structure of the log directory should not be changed: all of it can be moved, but not the checkpoint individually, otherwise the script cannot automatically retrieve the environment and agent configurations. -{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/configs/eval_config.yaml" >}} +{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/configs/eval_config.yaml" >}} To evaluate the agent you just need to run the following command: ```shell @@ -249,11 +249,11 @@ The `evaluate.py` script: ##### PPO Implementation -In this paragraph, we quote the code of our ppo implementation (the `ppo.py` file in the SheepRL PPO folder), just to give more context on how SheepRL works. In the `main()` function, all the components needed for training are instantiated (i.e., the agent, the environments, the buffer, the logger, and so on). Then, the environment interaction is performed, and after collecting the rollout steps, the train function is called. +In this paragraph, we quote the code of our ppo implementation (the `ppo.py` file in the SheepRL PPO folder), just to give more context on how SheepRL works. In the `main()` function, all the components needed for training are instantiated (i.e., the agent, the environments, the buffer, the logger, and so on). Then, the environment interaction is performed, and after collecting the rollout steps, the train function is called. The `train()` function is responsible for sharing the data between processes, if more processes are launched and the `buffer.share_data` is set to `True`. Then, for each batch, the losses are computed and the agent is updated. -{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/algos/ppo/ppo.py" >}} +{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/algos/ppo/ppo.py" >}} #### Parallel Environments In addition to what is seen in previous examples, this one demonstrates how to run training using parallel environments. In this example, the same PPO algorithm is used as before. @@ -282,7 +282,7 @@ diambra run -s=6 python train.py exp=custom_parallel_env_exp SheepRL allows training to be distributed thanks to Lightning Fabric. The default Fabric configuration is the following: -{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/configs/fabric/default.yaml" >}} +{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/configs/fabric/default.yaml" >}} {{% notice note %}} The `sheeprl.utils.callback.CheckpointCallback` is used for saving the checkpoint during training and for saving the trained agent. @@ -309,12 +309,12 @@ To run the fabric experiment, make sure you have a `cuda` GPU in your device, ot Finally, SheepRL allows you to visualize and monitor training using Tensorboard. {{% notice note %}} -We strongly recommend to read the SheepRL logging documentation to know about how to enable/disable logging. +We strongly recommend to read the SheepRL logging documentation to know about how to enable/disable logging. {{% /notice %}} Below is reported the default logging configuration and a table describing the arguments. -{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.1/sheeprl/configs/metric/default.yaml" >}} +{{< github_code "https://raw.githubusercontent.com/Eclectic-Sheep/sheeprl/v0.5.4/sheeprl/configs/metric/default.yaml" >}} | Argument | Type | Default Value(s) | Description | | ------------------------------------------------------------- | --------------------------------------------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |