- Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor
- Soft Actor-Critic
- Generalized State-Dependent Exploration
- Reverb: A framework for experience replay
- Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
- Acme: A Research Framework for Distributed Reinforcement Learning
- Dueling Network Architectures for Deep Reinforcement Learning
- Attention Is All You Need
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Install dependences
apt update -y apt install swig -y
- Install RL-Toolkit
pip3 install rl-toolkit[all]
- Run (for Server)
Run (for Agent)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
Run (for Learner)python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
Run (for Tester)python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5
-
Install dependences
Tensorflow for JetPack, follow instructions here for installation.sudo apt install swig -y
-
Install Reverb
Download Bazel 3.7.2 for arm64, heremkdir ~/bin mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel chmod +x ~/bin/bazel export PATH=$PATH:~/bin
Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
git clone https://github.com/deepmind/reverb cd reverb/ git checkout r0.9.0
Make changes in Reverb before building !
In .bazelrc- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64 + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
In WORKSPACE
- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
In oss_build.sh
- bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/... + bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/... # Builds Reverb and creates the wheel package. - bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package + bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
In reverb/cc/platform/default/repo.bzl
urls = [ - "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version), + "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version), ]
In reverb/pip_package/build_pip_package.sh
- "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null + "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
Build and install
bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8" bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release pip3 install /tmp/reverb/dist/dm_reverb-*
Cleaning
cd ../ rm -R reverb/
-
Install RL-Toolkit
pip3 install rl-toolkit
-
Run (for Server)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
Run (for Agent)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
Run (for Learner)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
Run (for Tester)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5
Environment | Observation space | Observation bounds | Action space | Action bounds | Reward bounds |
---|---|---|---|---|---|
BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] | [-1.0, 1.0] |
Walker2DBulletEnv-v0 | (22, ) | [-inf, inf] | (6, ) | [-1.0, 1.0] | [-1.0, 1.0] |
AntBulletEnv-v0 | (28, ) | [-inf, inf] | (8, ) | [-1.0, 1.0] | [-1.0, 1.0] |
HalfCheetahBulletEnv-v0 | (26, ) | [-inf, inf] | (6, ) | [-1.0, 1.0] | [-1.0, 1.0] |
HopperBulletEnv-v0 | (15, ) | [-inf, inf] | (3, ) | [-1.0, 1.0] | [-1.0, 1.0] |
HumanoidBulletEnv-v0 | (44, ) | [-inf, inf] | (17, ) | [-1.0, 1.0] | [-1.0, 1.0] |
MinitaurBulletEnv-v0 | (28, ) | [-167.72488, 167.72488] | (8, ) | [-1.0, 1.0] | [-1.0, 1.0] |
Environment | SAC + gSDE |
SAC + gSDE + Huber loss |
SAC + TQC + gSDE |
RL-Toolkit |
---|---|---|---|---|
BipedalWalkerHardcore-v3 | 13 ± 18(2) | 239 ± 118 | 228 ± 18(2) | 205 ± 134 |
Walker2DBulletEnv-v0 | 2270 ± 28(1) | 2732 ± 96 | 2535 ± 94(2) | 3123 ± 594 |
AntBulletEnv-v0 | 3106 ± 61(1) | 3460 ± 119 | 3700 ± 37(2) | 3993 ± 214 |
HalfCheetahBulletEnv-v0 | 2945 ± 95(1) | 3003 ± 226 | 3041 ± 157(2) | 2762 ± 153 |
HopperBulletEnv-v0 | 2515 ± 50(1) | 2555 ± 405 | 2401 ± 62(2) | 2151 ± 664 |
- SAC + gSDE + Huber loss
is stored here, branch r2.0 - SAC + TQC + gSDE + LogCosh + Reverb
is stored here, branch r4.0
Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV