Skip to content

shyamal-anadkat/offlinerl

Repository files navigation

Deep Reinforcement Learning

Shyamal H Anadkat | Fall '21

Background

Hello! This is a repository for AIPI530 DeepRL final project. The goal is to build a pipeline for offline RL. The starter code has been forked from d3rlpy (see citation at the bottom) Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.

Before diving in, I would recommend getting familiarized with basic Reinforcement Learning. Here is a link to my blog post on Reinforcement Learning to get you started: RL Primer

The blog post briefly covers the following:

  • What is reinforcement learning ?
  • What are the pros and cons of reinforcement learning ?
  • When should we consider applying reinforcement learning (and when should not) ?
  • What's the difference between supervised learning and reinforcement learning ?
  • What is offline reinforcement learning ? What are the pros and cons of offline reinforcement learning ?
  • When should we consider applying offline reinforcement learning (and when should not) ?
  • Have an example of offline reinforcement learning in the real-world

img.png source: https://bair.berkeley.edu/blog/2020/12/07/offline/

Getting Started

(please read carefully)

This project is customized to training CQL on a custom dataset in d3rlpy, and training OPE (FQE) to evaluate the trained policy. Important scripts:

  1. cql_train.py: at the root of the project is the main script, used to train cql & get evaluation scores
  2. plot_helper.py: utility script to help produce the plots required

How do I install & run this project ?


1. Clone this repository

git clone https://github.com/shyamal-anadkat/offlinerl

2. Install pybullet from source:

pip install git+https://github.com/takuseno/d4rl-pybullet

3. Install requirements:

pip install Cython numpy 
pip install -e .
  1. Execute cql_train.py found at the root of the project
    • Default dataset is hopper-bullet-mixed-v0
    • Default no. of epochs is 10. You can change this via custom args --epochs_cql & --epochs_fqe
    • For example if we want to run for 10 epochs:
python cql_train.py --epochs_cql 10 --epochs_fqe 10

(see colab example below for more clarity)

  1. Important Logs:

    • Estimated Q values vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/init_value.csv
    • Average reward vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/environment.csv
    • True Q values vs training steps (CQL): d3rlpy_logs/CQL_hopper-bullet-mixed-v0_1/true_q_value.csv
    • True Q & Estimated Q values vs training steps (FQE): d3rlpy_logs/FQE_hopper-bullet-mixed-v0_1/..
    • Note: I created my own scorer to calculate the true q values. See scorer.py (true_q_value_scorer) for implementation details)
  2. For plotting, I wrote a utility script (at root of the project) which can be executed like so

python plot_helper.py

Note: you can provide arguments that correspond to the path to the logs or it will use the default.

Other scripts:

  • Format: ./scripts/format
  • Linting: ./scripts/lint

Sample Plots (with 100 epochs):

img.png

Note: logs can be found in /d3rlpy_logs

Examples speak more: Open In Colab

Walkthrough: walkthrough.gif


Background on d3rlpy

d3rlpy is an offline deep reinforcement learning library for practitioners and researchers.

How do I install d3rlpy?

d3rlpy supports Linux, macOS and Windows. d3rlpy is not only easy, but also completely compatible with scikit-learn API, which means that you can maximize your productivity with the useful scikit-learn's utilities.

PyPI (recommended)

PyPI version PyPI - Downloads

$ pip install d3rlpy

More examples around d3rlpy usage

import d3rlpy

dataset, env = d3rlpy.datasets.get_dataset("hopper-medium-v0")

# prepare algorithm
sac = d3rlpy.algos.SAC()

# train offline
sac.fit(dataset, n_steps=1000000)

# train online
sac.fit_online(env, n_steps=1000000)

# ready to control
actions = sac.predict(x)

MuJoCo

import d3rlpy

# prepare dataset
dataset, env = d3rlpy.datasets.get_d4rl('hopper-medium-v0')

# prepare algorithm
cql = d3rlpy.algos.CQL(use_gpu=True)

# train
cql.fit(dataset,
        eval_episodes=dataset,
        n_epochs=100,
        scorers={
            'environment': d3rlpy.metrics.evaluate_on_environment(env),
            'td_error': d3rlpy.metrics.td_error_scorer
        })

See more datasets at d4rl.

Atari 2600

import d3rlpy
from sklearn.model_selection import train_test_split

# prepare dataset
dataset, env = d3rlpy.datasets.get_atari('breakout-expert-v0')

# split dataset
train_episodes, test_episodes = train_test_split(dataset, test_size=0.1)

# prepare algorithm
cql = d3rlpy.algos.DiscreteCQL(n_frames=4, q_func_factory='qr', scaler='pixel', use_gpu=True)

# start training
cql.fit(train_episodes,
        eval_episodes=test_episodes,
        n_epochs=100,
        scorers={
            'environment': d3rlpy.metrics.evaluate_on_environment(env),
            'td_error': d3rlpy.metrics.td_error_scorer
        })

See more Atari datasets at d4rl-atari.

PyBullet

import d3rlpy

# prepare dataset
dataset, env = d3rlpy.datasets.get_pybullet('hopper-bullet-mixed-v0')

# prepare algorithm
cql = d3rlpy.algos.CQL(use_gpu=True)

# start training
cql.fit(dataset,
        eval_episodes=dataset,
        n_epochs=100,
        scorers={
            'environment': d3rlpy.metrics.evaluate_on_environment(env),
            'td_error': d3rlpy.metrics.td_error_scorer
        })

See more PyBullet datasets at d4rl-pybullet.

How about some tutorials?

Try a cartpole example on Google Colaboratory:

  • official offline RL tutorial: Open In Colab

Citation

Thanks to Takuma Seno and his work on d3rlpy This wouldn't have been possible without it.

Seno, T., & Imai, M. (2021). d3rlpy: An Offline Deep Reinforcement Learning Library Conference paper. 35th Conference on Neural Information Processing Systems, Offline Reinforcement Learning Workshop, 2021

@InProceedings{seno2021d3rlpy,
  author = {Takuma Seno, Michita Imai},
  title = {d3rlpy: An Offline Deep Reinforcement Library},
  booktitle = {NeurIPS 2021 Offline Reinforcement Learning Workshop},
  month = {December},
  year = {2021}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages