Skip to content

Commit

Permalink
Merge pull request #101 from taku-y/improve_doc
Browse files Browse the repository at this point in the history
Improve doc
  • Loading branch information
taku-y committed Sep 1, 2024
2 parents d7316a0 + 0589202 commit 2fc7a46
Show file tree
Hide file tree
Showing 42 changed files with 954 additions and 671 deletions.
39 changes: 16 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,14 @@ Border consists of the following crates:
* [border-tensorboard](https://crates.io/crates/border-tensorboard) has `TensorboardRecorder` struct to write records which can be shown in Tensorboard. It is based on [tensorboard-rs](https://crates.io/crates/tensorboard-rs).
* [border-mlflow-tracking](https://crates.io/crates/border-mlflow-tracking) support MLflow tracking to log metrices during training via REST API.
* [border-async-trainer](https://crates.io/crates/border-async-trainer) defines some traits and functions for asynchronous training of RL agents by multiple actors, which runs sampling processes in parallel. In each sampling process, an agent interacts with an environment to collect samples to be sent to a shared replay buffer.
* [border](https://crates.io/crates/border) is just a collection of examples.
* Environment
* [border-py-gym-env](https://crates.io/crates/border-py-gym-env) is a wrapper of the [Gymnasium](https://gymnasium.farama.org) environments written in Python.
* [border-atari-env](https://crates.io/crates/border-atari-env) is a wrapper of [atari-env](https://crates.io/crates/atari-env), which is a part of [gym-rs](https://crates.io/crates/gym-rs).
* Agent
* [border-tch-agent](https://crates.io/crates/border-tch-agent) is a collection of RL agents based on [tch](https://crates.io/crates/tch), including Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC).
* [border-candle-agent](https://crates.io/crates/border-candle-agent) is a collection of RL agents based on [candle](https://crates.io/crates/candle-core)

You can use a part of these crates for your purposes, though [border-core](https://crates.io/crates/border-core) is mandatory. [This crate](https://crates.io/crates/border) is just a collection of examples. See [Documentation](https://docs.rs/border) for more details.
* [border-tch-agent](https://crates.io/crates/border-tch-agent) includes RL agents based on [tch](https://crates.io/crates/tch), including Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC).
* [border-candle-agent](https://crates.io/crates/border-candle-agent) includes RL agents based on [candle](https://crates.io/crates/candle-core)
* [border-policy-no-backend](https://crates.io/crates/border-policy-no-backend) includes a policy that is independent of any deep learning backend, such as Torch.

## Status

Expand All @@ -35,24 +35,17 @@ There are some example sctipts in `border/examples` directory. These are tested

In `docker` directory, there are scripts for running a Docker container, in which you can try the examples described above. Currently, only `aarch64` is mainly used for the development.

## Tests

The following command has been tested in the Docker container running on M2 Macbook air.

```bash
cargo test --features=tch
```

## License

Crates | License
------------------------|------------------
`border-core` | MIT OR Apache-2.0
`border-tensorboard` | MIT OR Apache-2.0
`border-mlflow-tracking`| MIT OR Apache-2.0
`border-async-trainer` | MIT OR Apache-2.0
`border-py-gym-env` | MIT OR Apache-2.0
`border-atari-env` | GPL-2.0-or-later
`border-tch-agent` | MIT OR Apache-2.0
`border-candle-agent` | MIT OR Apache-2.0
`border` | GPL-2.0-or-later
Crates | License
--------------------------|------------------
`border-core` | MIT OR Apache-2.0
`border-tensorboard` | MIT OR Apache-2.0
`border-mlflow-tracking` | MIT OR Apache-2.0
`border-async-trainer` | MIT OR Apache-2.0
`border-py-gym-env` | MIT OR Apache-2.0
`border-atari-env` | GPL-2.0-or-later
`border-tch-agent` | MIT OR Apache-2.0
`border-candle-agent` | MIT OR Apache-2.0
`border-policy-no-backend`| MIT OR Apache-2.0
`border` | GPL-2.0-or-later
57 changes: 51 additions & 6 deletions border-async-trainer/src/async_trainer/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,57 @@ pub struct AsyncTrainerConfig {
}

impl AsyncTrainerConfig {
/// Sets the number of optimization steps.
pub fn max_opts(mut self, v: usize) -> Result<Self> {
self.max_opts = v;
Ok(self)
}

/// Sets the interval of evaluation in optimization steps.
pub fn eval_interval(mut self, v: usize) -> Result<Self> {
self.eval_interval = v;
Ok(self)
}

/// Sets the directory the trained model being saved.
pub fn model_dir<T: Into<String>>(mut self, model_dir: T) -> Result<Self> {
self.model_dir = Some(model_dir.into());
Ok(self)
}

/// Sets the interval of computation cost in optimization steps.
pub fn record_compute_cost_interval(
mut self,
record_compute_cost_interval: usize,
) -> Result<Self> {
self.record_compute_cost_interval = record_compute_cost_interval;
Ok(self)
}

/// Sets the interval of flushing recordd in optimization steps.
pub fn flush_record_interval(mut self, flush_record_interval: usize) -> Result<Self> {
self.flush_record_interval = flush_record_interval;
Ok(self)
}

/// Sets warmup period in environment steps.
pub fn warmup_period(mut self, warmup_period: usize) -> Result<Self> {
self.warmup_period = warmup_period;
Ok(self)
}

/// Sets the interval of saving in optimization steps.
pub fn save_interval(mut self, save_interval: usize) -> Result<Self> {
self.save_interval = save_interval;
Ok(self)
}

/// Sets the interval of synchronizing model parameters in training steps.
pub fn sync_interval(mut self, sync_interval: usize) -> Result<Self> {
self.sync_interval = sync_interval;
Ok(self)
}

/// Constructs [AsyncTrainerConfig] from YAML file.
pub fn load(path: impl AsRef<Path>) -> Result<Self> {
let file = File::open(path)?;
Expand All @@ -49,12 +100,6 @@ impl AsyncTrainerConfig {
file.write_all(serde_yaml::to_string(&self)?.as_bytes())?;
Ok(())
}

/// Sets the directory the trained model being saved.
pub fn model_dir<T: Into<String>>(mut self, model_dir: T) -> Result<Self> {
self.model_dir = Some(model_dir.into());
Ok(self)
}
}

impl Default for AsyncTrainerConfig {
Expand Down
9 changes: 9 additions & 0 deletions border-core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,15 @@
//! In the training loop of this method, the agent interacts with the environment to
//! take samples and perform optimization steps. Some metrices are recorded at the same time.
//!
//! # Evaluator
//!
//! [`Evaluator<E, P>`] is used to evaluate the policy's (`P`) performance in the environment (`E`).
//! The object of this type is given to the [`Trainer`] object to evaluate the policy during training.
//! [`DefaultEvaluator<E, P>`] is a default implementation of [`Evaluator<E, P>`].
//! This evaluator runs the policy in the environment for a certain number of episodes.
//! At the start of each episode, the environment is reset using [`Env::reset_with_index()`]
//! to control specific conditions for evaluation.
//!
//! [`SimpleReplayBuffer`]: replay_buffer::SimpleReplayBuffer
//! [`SimpleReplayBuffer<O, A>`]: generic_replay_buffer::SimpleReplayBuffer
//! [`BatchBase`]: generic_replay_buffer::BatchBase
Expand Down
2 changes: 1 addition & 1 deletion border-mlflow-tracking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ mlflow server --host 127.0.0.1 --port 8080
```

Then, training configurations and metrices can be logged to the tracking server.
The following code is an example. Nested configuration parameters will be flattened,
The following code provides an example. Nested configuration parameters will be flattened,
logged like `hyper_params.param1`, `hyper_params.param2`.

```rust
Expand Down
2 changes: 1 addition & 1 deletion border-py-gym-env/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description.workspace = true
repository.workspace = true
keywords.workspace = true
categories.workspace = true
license.workspace = true
package.license = "GPL-2.0-or-later"
readme = "README.md"

[dependencies]
Expand Down
2 changes: 1 addition & 1 deletion border-py-gym-env/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
//!
//! ## Observation
//!
//! Obsservation is created in Python and passed to Rust as a Python object. In order to convert
//! Observation is created in Python and passed to Rust as a Python object. In order to convert
//! Python object to Rust object, this crate provides [`GymObsFilter`] trait. This trait has
//! [`GymObsFilter::filt`] method which converts Python object to Rust object.
//! The type of the Rust object after conversion corresponds to the type parameter `O` of the trait
Expand Down
14 changes: 7 additions & 7 deletions border/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ required-features = ["tch"]
test = false

[[example]]
name = "dqn_atari_async"
path = "examples/atari/dqn_atari_async.rs"
name = "dqn_atari_async_tch"
path = "examples/atari/dqn_atari_async_tch.rs"
required-features = ["tch", "border-async-trainer"]
test = false

Expand Down Expand Up @@ -115,11 +115,11 @@ path = "examples/gym/convert_sac_policy_to_edge.rs"
required-features = ["border-tch-agent", "tch"]
test = false

# [[example]]
# name = "sac_ant_async"
# path = "examples/mujoco/sac_ant_async.rs"
# required-features = ["tch", "border-async-trainer"]
# test = false
[[example]]
name = "sac_mujoco_async_tch"
path = "examples/mujoco/sac_mujoco_async_tch.rs"
required-features = ["tch", "border-async-trainer"]
test = false

[[example]]
name = "pendulum_edge"
Expand Down
46 changes: 30 additions & 16 deletions border/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,29 +9,43 @@ A reinforcement learning library in Rust.

Border consists of the following crates:

* [border-core](https://crates.io/crates/border-core) provides basic traits and functions generic to environments and reinforcmenet learning (RL) agents.
* [border-py-gym-env](https://crates.io/crates/border-py-gym-env) is a wrapper of the [Gym](https://gym.openai.com) environments written in Python, with the support of [pybullet-gym](https://github.com/benelot/pybullet-gym) and [atari](https://github.com/mgbellemare/Arcade-Learning-Environment).
* [border-atari-env](https://crates.io/crates/border-atari-env) is a wrapper of [atari-env](https://crates.io/crates/atari-env), which is a part of [gym-rs](https://crates.io/crates/gym-rs).
* [border-tch-agent](https://crates.io/crates/border-tch-agent) is a collection of RL agents based on [tch](https://crates.io/crates/tch). Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC) are includes.
* [border-async-trainer](https://crates.io/crates/border-async-trainer) defines some traits and functions for asynchronous training of RL agents by multiple actors, each of which runs a sampling process of an agent and an environment in parallel.

You can use a part of these crates for your purposes, though [border-core](https://crates.io/crates/border-core) is mandatory. [This crate](https://crates.io/crates/border) is just a collection of examples. See [Documentation](https://docs.rs/border) for more details.
* Core and utility
* [border-core](https://crates.io/crates/border-core) provides basic traits and functions generic to environments and reinforcmenet learning (RL) agents.
* [border-tensorboard](https://crates.io/crates/border-tensorboard) has `TensorboardRecorder` struct to write records which can be shown in Tensorboard. It is based on [tensorboard-rs](https://crates.io/crates/tensorboard-rs).
* [border-mlflow-tracking](https://crates.io/crates/border-mlflow-tracking) support MLflow tracking to log metrices during training via REST API.
* [border-async-trainer](https://crates.io/crates/border-async-trainer) defines some traits and functions for asynchronous training of RL agents by multiple actors, which runs sampling processes in parallel. In each sampling process, an agent interacts with an environment to collect samples to be sent to a shared replay buffer.
* [border](https://crates.io/crates/border) is just a collection of examples.
* Environment
* [border-py-gym-env](https://crates.io/crates/border-py-gym-env) is a wrapper of the [Gymnasium](https://gymnasium.farama.org) environments written in Python.
* [border-atari-env](https://crates.io/crates/border-atari-env) is a wrapper of [atari-env](https://crates.io/crates/atari-env), which is a part of [gym-rs](https://crates.io/crates/gym-rs).
* Agent
* [border-tch-agent](https://crates.io/crates/border-tch-agent) includes RL agents based on [tch](https://crates.io/crates/tch), including Deep Q network (DQN), implicit quantile network (IQN), and soft actor critic (SAC).
* [border-candle-agent](https://crates.io/crates/border-candle-agent) includes RL agents based on [candle](https://crates.io/crates/candle-core)
* [border-policy-no-backend](https://crates.io/crates/border-policy-no-backend) includes a policy that is independent of any deep learning backend, such as Torch.

## Status

Border is experimental and currently under development. API is unstable.

## Examples

In examples directory, you can see how to run some examples. Python>=3.7 and [gym](https://gym.openai.com) must be installed for running examples using [border-py-gym-env](https://crates.io/crates/border-py-gym-env). Some examples requires [PyBullet Gym](https://github.com/benelot/pybullet-gym). As the agents used in the examples are based on [tch-rs](https://github.com/LaurentMazare/tch-rs), libtorch is required to be installed.
There are some example sctipts in `border/examples` directory. These are tested in Docker containers, speficically the one in `aarch64` directory on M2 Macbook air. Some scripts take few days for the training process, tested on Ubuntu22.04 virtual machine in [GPUSOROBAN](https://soroban.highreso.jp), a computing cloud.

## Docker

In `docker` directory, there are scripts for running a Docker container, in which you can try the examples described above. Currently, only `aarch64` is mainly used for the development.

## License

Crates | License
----------------------|------------------
`border-core` | MIT OR Apache-2.0
`border-py-gym-env` | MIT OR Apache-2.0
`border-atari-env` | GPL-2.0-or-later
`border-tch-agent` | MIT OR Apache-2.0
`border-async-trainer`| MIT OR Apache-2.0
`border` | GPL-2.0-or-later
Crates | License
--------------------------|------------------
`border-core` | MIT OR Apache-2.0
`border-tensorboard` | MIT OR Apache-2.0
`border-mlflow-tracking` | MIT OR Apache-2.0
`border-async-trainer` | MIT OR Apache-2.0
`border-py-gym-env` | MIT OR Apache-2.0
`border-atari-env` | GPL-2.0-or-later
`border-tch-agent` | MIT OR Apache-2.0
`border-candle-agent` | MIT OR Apache-2.0
`border-policy-no-backend`| MIT OR Apache-2.0
`border` | GPL-2.0-or-later
7 changes: 7 additions & 0 deletions border/examples/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
The following directories contain example scripts.

* `gym` - Classic control environments in [Gymnasium](https://gymnasium.farama.org) based on [border-py-gym-env](https://crates.io/crates/border-py-gym-env).
* `gym-robotics` - A robotic environment (fetch-reach) in [Gymnasium-Robotics](https://robotics.farama.org/) based on [border-py-gym-env](https://crates.io/crates/border-py-gym-env).
* `mujoco` - Mujoco environments in [Gymnasium](https://gymnasium.farama.org) based on [border-py-gym-env](https://crates.io/crates/border-py-gym-env).
* `atari` - Atari environments based on [border-atari-env](https://crates.io/crates/border-atari-env) is a wrapper of [atari-env](https://crates.io/crates/atari-env), which is a part of [gym-rs](https://crates.io/crates/gym-rs).

## Gym

You may need to set PYTHONPATH as `PYTHONPATH=./border-py-gym-env/examples`.
Expand Down
1 change: 0 additions & 1 deletion border/examples/atari/dqn_atari.rs
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,6 @@ mod utils {
#[command(version, about)]
struct Args {
/// Name of the game
#[arg(long)]
name: String,

/// Train DQN agent, not evaluate
Expand Down
Loading

0 comments on commit 2fc7a46

Please sign in to comment.