Skip to content

markub3327/rl-toolkit

Repository files navigation

RL Toolkit

Release Tag Issues Commits Languages Size

Papers

Installation with PyPI

On PC AMD64 with Ubuntu/Debian

  1. Install dependences
    apt update -y
    apt install swig -y
  2. Install RL-Toolkit
    pip3 install rl-toolkit[all]
  3. Run (for Server)
    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
    Run (for Agent)
    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
    Run (for Learner)
    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
    Run (for Tester)
    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5

On NVIDIA Jetson

  1. Install dependences
    Tensorflow for JetPack, follow instructions here for installation.

    sudo apt install swig -y
  2. Install Reverb
    Download Bazel 3.7.2 for arm64, here

    mkdir ~/bin
    mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
    chmod +x ~/bin/bazel
    export PATH=$PATH:~/bin

    Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !

    git clone https://github.com/deepmind/reverb
    cd reverb/
    git checkout r0.9.0

    Make changes in Reverb before building !
    In .bazelrc

    - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
    + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
    
    - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
    + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64

    In WORKSPACE

    - PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
    + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
    + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"

    In oss_build.sh

    -  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
    +  bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
    
    # Builds Reverb and creates the wheel package.
    -  bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
    +  bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package

    In reverb/cc/platform/default/repo.bzl

    urls = [
       -        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
       +        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
    ]

    In reverb/pip_package/build_pip_package.sh

    -  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
    +  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG}  > /dev/null

    Build and install

    bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8"
    bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
    pip3 install /tmp/reverb/dist/dm_reverb-*

    Cleaning

    cd ../
    rm -R reverb/
  3. Install RL-Toolkit

    pip3 install rl-toolkit
  4. Run (for Server)

    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server

    Run (for Agent)

    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost

    Run (for Learner)

    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2

    Run (for Tester)

    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5

Environments

Environment Observation space Observation bounds Action space Action bounds Reward bounds
BipedalWalkerHardcore-v3 (24, ) [-inf, inf] (4, ) [-1.0, 1.0] [-1.0, 1.0]
Walker2DBulletEnv-v0 (22, ) [-inf, inf] (6, ) [-1.0, 1.0] [-1.0, 1.0]
AntBulletEnv-v0 (28, ) [-inf, inf] (8, ) [-1.0, 1.0] [-1.0, 1.0]
HalfCheetahBulletEnv-v0 (26, ) [-inf, inf] (6, ) [-1.0, 1.0] [-1.0, 1.0]
HopperBulletEnv-v0 (15, ) [-inf, inf] (3, ) [-1.0, 1.0] [-1.0, 1.0]
HumanoidBulletEnv-v0 (44, ) [-inf, inf] (17, ) [-1.0, 1.0] [-1.0, 1.0]
MinitaurBulletEnv-v0 (28, ) [-167.72488, 167.72488] (8, ) [-1.0, 1.0] [-1.0, 1.0]

Results

Environment SAC
+ gSDE
SAC
+ gSDE
+ Huber loss
SAC
+ TQC
+ gSDE
RL-Toolkit
BipedalWalkerHardcore-v3 13 ± 18(2) 239 ± 118 228 ± 18(2) 205 ± 134
Walker2DBulletEnv-v0 2270 ± 28(1) 2732 ± 96 2535 ± 94(2) 3123 ± 594
AntBulletEnv-v0 3106 ± 61(1) 3460 ± 119 3700 ± 37(2) 3993 ± 214
HalfCheetahBulletEnv-v0 2945 ± 95(1) 3003 ± 226 3041 ± 157(2) 2762 ± 153
HopperBulletEnv-v0 2515 ± 50(1) 2555 ± 405 2401 ± 62(2) 2151 ± 664

results anim1 anim2

Releases

  • SAC + gSDE + Huber loss
      is stored here, branch r2.0
  • SAC + TQC + gSDE + LogCosh + Reverb
      is stored here, branch r4.0

Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV