Merge pull request #101 from taku-y/improve_doc

Improve doc
laboroai · Sep 1, 2024 · 2fc7a46 · 2fc7a46
2 parents d7316a0 + 0589202
commit 2fc7a46
Show file tree

Hide file tree

Showing 42 changed files with 954 additions and 671 deletions.
diff --git a/README.md b/README.md
@@ -14,14 +14,14 @@ Border consists of the following crates:
   * [border-tensorboard](https://crates.io/crates/border-tensorboard) has `TensorboardRecorder` struct to write records which can be shown in Tensorboard. It is based on [tensorboard-rs](https://crates.io/crates/tensorboard-rs).
   * [border-mlflow-tracking](https://crates.io/crates/border-mlflow-tracking) support MLflow tracking to log metrices during training via REST API.
   * [border-async-trainer](https://crates.io/crates/border-async-trainer) defines some traits and functions for asynchronous training of RL agents by multiple actors, which runs sampling processes in parallel. In each sampling process, an agent interacts with an environment to collect samples to be sent to a shared replay buffer.
+  * [border](https://crates.io/crates/border) is just a collection of examples.
 * Environment
   * [border-py-gym-env](https://crates.io/crates/border-py-gym-env) is a wrapper of the [Gymnasium](https://gymnasium.farama.org) environments written in Python.
   * [border-atari-env](https://crates.io/crates/border-atari-env) is a wrapper of [atari-env](https://crates.io/crates/atari-env), which is a part of [gym-rs](https://crates.io/crates/gym-rs).
 * Agent
-  * [border-tch-agent](https://crates.io/crates/border-tch-agent) is a collection of RL agents based on [tch](https://crates.io/crates/tch), including Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC).
-  * [border-candle-agent](https://crates.io/crates/border-candle-agent) is a collection of RL agents based on [candle](https://crates.io/crates/candle-core)
-
-You can use a part of these crates for your purposes, though [border-core](https://crates.io/crates/border-core) is mandatory. [This crate](https://crates.io/crates/border) is just a collection of examples. See [Documentation](https://docs.rs/border) for more details.
+  * [border-tch-agent](https://crates.io/crates/border-tch-agent) includes RL agents based on [tch](https://crates.io/crates/tch), including Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC).
+  * [border-candle-agent](https://crates.io/crates/border-candle-agent) includes RL agents based on [candle](https://crates.io/crates/candle-core)
+  * [border-policy-no-backend](https://crates.io/crates/border-policy-no-backend) includes a policy that is independent of any deep learning backend, such as Torch.
 
 ## Status
 
@@ -35,24 +35,17 @@ There are some example sctipts in `border/examples` directory. These are tested
 
 In `docker` directory, there are scripts for running a Docker container, in which you can try the examples described above. Currently, only `aarch64` is mainly used for the development.
 
-## Tests
-
-The following command has been tested in the Docker container running on M2 Macbook air.
-
-```bash
-cargo test --features=tch
-```
-
 ## License
 
-Crates                  | License
-------------------------|------------------
-`border-core`           | MIT OR Apache-2.0
-`border-tensorboard`    | MIT OR Apache-2.0
-`border-mlflow-tracking`| MIT OR Apache-2.0
-`border-async-trainer`  | MIT OR Apache-2.0
-`border-py-gym-env`     | MIT OR Apache-2.0
-`border-atari-env`      | GPL-2.0-or-later
-`border-tch-agent`      | MIT OR Apache-2.0
-`border-candle-agent`   | MIT OR Apache-2.0
-`border`                | GPL-2.0-or-later
+Crates                    | License
+--------------------------|------------------
+`border-core`             | MIT OR Apache-2.0
+`border-tensorboard`      | MIT OR Apache-2.0
+`border-mlflow-tracking`  | MIT OR Apache-2.0
+`border-async-trainer`    | MIT OR Apache-2.0
+`border-py-gym-env`       | MIT OR Apache-2.0
+`border-atari-env`        | GPL-2.0-or-later
+`border-tch-agent`        | MIT OR Apache-2.0
+`border-candle-agent`     | MIT OR Apache-2.0
+`border-policy-no-backend`| MIT OR Apache-2.0
+`border`                  | GPL-2.0-or-later
diff --git a/border-async-trainer/src/async_trainer/config.rs b/border-async-trainer/src/async_trainer/config.rs
@@ -35,6 +35,57 @@ pub struct AsyncTrainerConfig {
 }
 
 impl AsyncTrainerConfig {
+    /// Sets the number of optimization steps.
+    pub fn max_opts(mut self, v: usize) -> Result<Self> {
+        self.max_opts = v;
+        Ok(self)
+    }
+
+    /// Sets the interval of evaluation in optimization steps.
+    pub fn eval_interval(mut self, v: usize) -> Result<Self> {
+        self.eval_interval = v;
+        Ok(self)
+    }
+
+    /// Sets the directory the trained model being saved.
+    pub fn model_dir<T: Into<String>>(mut self, model_dir: T) -> Result<Self> {
+        self.model_dir = Some(model_dir.into());
+        Ok(self)
+    }
+
+    /// Sets the interval of computation cost in optimization steps.
+    pub fn record_compute_cost_interval(
+        mut self,
+        record_compute_cost_interval: usize,
+    ) -> Result<Self> {
+        self.record_compute_cost_interval = record_compute_cost_interval;
+        Ok(self)
+    }
+
+    /// Sets the interval of flushing recordd in optimization steps.
+    pub fn flush_record_interval(mut self, flush_record_interval: usize) -> Result<Self> {
+        self.flush_record_interval = flush_record_interval;
+        Ok(self)
+    }
+
+    /// Sets warmup period in environment steps.
+    pub fn warmup_period(mut self, warmup_period: usize) -> Result<Self> {
+        self.warmup_period = warmup_period;
+        Ok(self)
+    }
+
+    /// Sets the interval of saving in optimization steps.
+    pub fn save_interval(mut self, save_interval: usize) -> Result<Self> {
+        self.save_interval = save_interval;
+        Ok(self)
+    }
+
+    /// Sets the interval of synchronizing model parameters in training steps.
+    pub fn sync_interval(mut self, sync_interval: usize) -> Result<Self> {
+        self.sync_interval = sync_interval;
+        Ok(self)
+    }
+
     /// Constructs [AsyncTrainerConfig] from YAML file.
     pub fn load(path: impl AsRef<Path>) -> Result<Self> {
         let file = File::open(path)?;
@@ -49,12 +100,6 @@ impl AsyncTrainerConfig {
         file.write_all(serde_yaml::to_string(&self)?.as_bytes())?;
         Ok(())
     }
-
-    /// Sets the directory the trained model being saved.
-    pub fn model_dir<T: Into<String>>(mut self, model_dir: T) -> Result<Self> {
-        self.model_dir = Some(model_dir.into());
-        Ok(self)
-    }
 }
 
 impl Default for AsyncTrainerConfig {

diff --git a/border-core/src/lib.rs b/border-core/src/lib.rs
@@ -81,6 +81,15 @@
 //! In the training loop of this method, the agent interacts with the environment to
 //! take samples and perform optimization steps. Some metrices are recorded at the same time.
 //! 
+//! # Evaluator
+//! 
+//! [`Evaluator<E, P>`] is used to evaluate the policy's (`P`) performance in the environment (`E`).
+//! The object of this type is given to the [`Trainer`] object to evaluate the policy during training.
+//! [`DefaultEvaluator<E, P>`] is a default implementation of [`Evaluator<E, P>`].
+//! This evaluator runs the policy in the environment for a certain number of episodes.
+//! At the start of each episode, the environment is reset using [`Env::reset_with_index()`]
+//! to control specific conditions for evaluation.
+//! 
 //! [`SimpleReplayBuffer`]: replay_buffer::SimpleReplayBuffer
 //! [`SimpleReplayBuffer<O, A>`]: generic_replay_buffer::SimpleReplayBuffer
 //! [`BatchBase`]: generic_replay_buffer::BatchBase

diff --git a/border-mlflow-tracking/README.md b/border-mlflow-tracking/README.md
@@ -7,7 +7,7 @@ mlflow server --host 127.0.0.1 --port 8080
 ```
 
 Then, training configurations and metrices can be logged to the tracking server.
-The following code is an example. Nested configuration parameters will be flattened,
+The following code provides an example. Nested configuration parameters will be flattened,
 logged like `hyper_params.param1`, `hyper_params.param2`.
 
 ```rust

diff --git a/border-py-gym-env/Cargo.toml b/border-py-gym-env/Cargo.toml
@@ -6,7 +6,7 @@ description.workspace = true
 repository.workspace = true
 keywords.workspace = true
 categories.workspace = true
-license.workspace = true
+package.license = "GPL-2.0-or-later"
 readme = "README.md"
 
 [dependencies]

diff --git a/border-py-gym-env/src/lib.rs b/border-py-gym-env/src/lib.rs
@@ -8,7 +8,7 @@
 //! 
 //! ## Observation
 //! 
-//! Obsservation is created in Python and passed to Rust as a Python object. In order to convert
+//! Observation is created in Python and passed to Rust as a Python object. In order to convert
 //! Python object to Rust object, this crate provides [`GymObsFilter`] trait. This trait has
 //! [`GymObsFilter::filt`] method which converts Python object to Rust object.
 //! The type of the Rust object after conversion corresponds to the type parameter `O` of the trait

diff --git a/border/Cargo.toml b/border/Cargo.toml
@@ -81,8 +81,8 @@ required-features = ["tch"]
 test = false
 
 [[example]]
-name = "dqn_atari_async"
-path = "examples/atari/dqn_atari_async.rs"
+name = "dqn_atari_async_tch"
+path = "examples/atari/dqn_atari_async_tch.rs"
 required-features = ["tch", "border-async-trainer"]
 test = false
 
@@ -115,11 +115,11 @@ path = "examples/gym/convert_sac_policy_to_edge.rs"
 required-features = ["border-tch-agent", "tch"]
 test = false
 
-# [[example]]
-# name = "sac_ant_async"
-# path = "examples/mujoco/sac_ant_async.rs"
-# required-features = ["tch", "border-async-trainer"]
-# test = false
+[[example]]
+name = "sac_mujoco_async_tch"
+path = "examples/mujoco/sac_mujoco_async_tch.rs"
+required-features = ["tch", "border-async-trainer"]
+test = false
 
 [[example]]
 name = "pendulum_edge"

diff --git a/border/README.md b/border/README.md
@@ -9,29 +9,43 @@ A reinforcement learning library in Rust.
 
 Border consists of the following crates:
 
-* [border-core](https://crates.io/crates/border-core) provides basic traits and functions generic to environments and reinforcmenet learning (RL) agents.
-* [border-py-gym-env](https://crates.io/crates/border-py-gym-env) is a wrapper of the [Gym](https://gym.openai.com) environments written in Python, with the support of [pybullet-gym](https://github.com/benelot/pybullet-gym) and [atari](https://github.com/mgbellemare/Arcade-Learning-Environment).
-* [border-atari-env](https://crates.io/crates/border-atari-env) is a wrapper of [atari-env](https://crates.io/crates/atari-env), which is a part of [gym-rs](https://crates.io/crates/gym-rs).
-* [border-tch-agent](https://crates.io/crates/border-tch-agent) is a collection of RL agents based on [tch](https://crates.io/crates/tch). Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC) are includes.
-* [border-async-trainer](https://crates.io/crates/border-async-trainer) defines some traits and functions for asynchronous training of RL agents by multiple actors, each of which runs a sampling process of an agent and an environment in parallel.
-
-You can use a part of these crates for your purposes, though [border-core](https://crates.io/crates/border-core) is mandatory. [This crate](https://crates.io/crates/border) is just a collection of examples. See [Documentation](https://docs.rs/border) for more details. 
+* Core and utility
+  * [border-core](https://crates.io/crates/border-core) provides basic traits and functions generic to environments and reinforcmenet learning (RL) agents.
+  * [border-tensorboard](https://crates.io/crates/border-tensorboard) has `TensorboardRecorder` struct to write records which can be shown in Tensorboard. It is based on [tensorboard-rs](https://crates.io/crates/tensorboard-rs).
+  * [border-mlflow-tracking](https://crates.io/crates/border-mlflow-tracking) support MLflow tracking to log metrices during training via REST API.
+  * [border-async-trainer](https://crates.io/crates/border-async-trainer) defines some traits and functions for asynchronous training of RL agents by multiple actors, which runs sampling processes in parallel. In each sampling process, an agent interacts with an environment to collect samples to be sent to a shared replay buffer.
+  * [border](https://crates.io/crates/border) is just a collection of examples.
+* Environment
+  * [border-py-gym-env](https://crates.io/crates/border-py-gym-env) is a wrapper of the [Gymnasium](https://gymnasium.farama.org) environments written in Python.
+  * [border-atari-env](https://crates.io/crates/border-atari-env) is a wrapper of [atari-env](https://crates.io/crates/atari-env), which is a part of [gym-rs](https://crates.io/crates/gym-rs).
+* Agent
+  * [border-tch-agent](https://crates.io/crates/border-tch-agent) includes RL agents based on [tch](https://crates.io/crates/tch), including Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC).
+  * [border-candle-agent](https://crates.io/crates/border-candle-agent) includes RL agents based on [candle](https://crates.io/crates/candle-core)
+  * [border-policy-no-backend](https://crates.io/crates/border-policy-no-backend) includes a policy that is independent of any deep learning backend, such as Torch.
 
 ## Status
 
 Border is experimental and currently under development. API is unstable.
 
 ## Examples
 
-In examples directory, you can see how to run some examples. Python>=3.7 and [gym](https://gym.openai.com) must be installed for running examples using [border-py-gym-env](https://crates.io/crates/border-py-gym-env). Some examples requires [PyBullet Gym](https://github.com/benelot/pybullet-gym). As the agents used in the examples are based on [tch-rs](https://github.com/LaurentMazare/tch-rs), libtorch is required to be installed.
+There are some example sctipts in `border/examples` directory. These are tested in Docker containers, speficically the one in `aarch64` directory on M2 Macbook air. Some scripts take few days for the training process, tested on Ubuntu22.04 virtual machine in  [GPUSOROBAN](https://soroban.highreso.jp), a computing cloud.
+
+## Docker
+
+In `docker` directory, there are scripts for running a Docker container, in which you can try the examples described above. Currently, only `aarch64` is mainly used for the development.
 
 ## License
 
-Crates                | License
-----------------------|------------------
-`border-core`         | MIT OR Apache-2.0
-`border-py-gym-env`   | MIT OR Apache-2.0
-`border-atari-env`    | GPL-2.0-or-later
-`border-tch-agent`    | MIT OR Apache-2.0
-`border-async-trainer`| MIT OR Apache-2.0
-`border`              | GPL-2.0-or-later
+Crates                    | License
+--------------------------|------------------
+`border-core`             | MIT OR Apache-2.0
+`border-tensorboard`      | MIT OR Apache-2.0
+`border-mlflow-tracking`  | MIT OR Apache-2.0
+`border-async-trainer`    | MIT OR Apache-2.0
+`border-py-gym-env`       | MIT OR Apache-2.0
+`border-atari-env`        | GPL-2.0-or-later
+`border-tch-agent`        | MIT OR Apache-2.0
+`border-candle-agent`     | MIT OR Apache-2.0
+`border-policy-no-backend`| MIT OR Apache-2.0
+`border`                  | GPL-2.0-or-later
diff --git a/border/examples/README.md b/border/examples/README.md
@@ -1,3 +1,10 @@
+The following directories contain example scripts.
+
+* `gym` - Classic control environments in [Gymnasium](https://gymnasium.farama.org) based on [border-py-gym-env](https://crates.io/crates/border-py-gym-env).
+* `gym-robotics` - A robotic environment (fetch-reach) in [Gymnasium-Robotics](https://robotics.farama.org/) based on [border-py-gym-env](https://crates.io/crates/border-py-gym-env).
+* `mujoco` - Mujoco environments in [Gymnasium](https://gymnasium.farama.org) based on [border-py-gym-env](https://crates.io/crates/border-py-gym-env).
+* `atari` - Atari environments based on [border-atari-env](https://crates.io/crates/border-atari-env) is a wrapper of [atari-env](https://crates.io/crates/atari-env), which is a part of [gym-rs](https://crates.io/crates/gym-rs).
+
 ## Gym
 
 You may need to set PYTHONPATH as `PYTHONPATH=./border-py-gym-env/examples`.

diff --git a/border/examples/atari/dqn_atari.rs b/border/examples/atari/dqn_atari.rs
@@ -221,7 +221,6 @@ mod utils {
 #[command(version, about)]
 struct Args {
     /// Name of the game
-    #[arg(long)]
     name: String,
 
     /// Train DQN agent, not evaluate