PG is all you need!

This is a step-by-step tutorial for Policy Gradient algorithms from A2C to SAC, including learning acceleration methods using demonstrations for treating real applications with sparse rewards. Every chapter contains both of theoretical backgrounds and object-oriented implementation. Just pick any topic in which you are interested, and learn! You can execute them right away with Colab even on your smartphone.

Please feel free to open an issue or a pull-request if you have any idea to make it better. :)

If you want a tutorial for DQN series, please see Rainbow is All You Need.

Advantage Actor-Critic (A2C) [NBViewer] [Colab]
Proximal Policy Optimization Algorithms (PPO) [NBViewer] [Colab]
Deep Deterministic Policy Gradient (DDPG) [NBViewer] [Colab]
Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3) [NBViewer] [Colab]
Soft Actor-Critic (SAC) [NBViewer] [Colab]
DDPG from Demonstration (DDPGfD) [NBViewer] [Colab]
Behavior Cloning (with DDPG) [NBViewer] [Colab]

Environment

Pendulum-v0

Reference: OpenAI gym Pendulum-v0

Observation

Type: Box(3)

Num	Observation	Min	Max
0	cos(theta)	-1.0	1.0
1	sin(theta)	-1.0	1.0
2	theta dot	-8.0	8.0

Actions

Type: Box(1)

Num	Action	Min	Max
0	Joint effort	-2.0	2.0

Reward

The precise equation for reward:

-(theta^2 + 0.1*theta_dt^2 + 0.001*action^2)

Theta is normalized between -pi and pi. Therefore, the lowest cost is -(pi^2 + 0.1*8^2 + 0.001*2^2) = -16.2736044, and the highest cost is 0. In essence, the goal is to remain at zero angle (vertical), with the least rotational velocity, and the least effort. Max steps per an episode is 200 steps.

Prerequisites

This repository is tested on Anaconda virtual environment with python 3.6.1+

$ conda create -n pg-is-all-you-need python=3.6.9
$ conda activate pg-is-all-you-need

Installation

First, clone the repository.

git clone https://github.com/MrSyee/pg-is-all-you-need.git
cd pg-is-all-you-need

Secondly, install packages required to execute the code. Just type:

make dep

Development

Install packages required to develop the code:

make dev

If you want to check the difference of jupyter files that you modified, use nbdime:

nbdiff-web

Related Papers

Contributors

Thanks goes to these wonderful people (emoji key):

_{Kyunghwan Kim}
💻 📖

_{Jinwoo Park (Curt)}
💻 📖

_{Mincheol Kim}
💻 📖

_Fazl
🚧

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.all-contributorsrc		.all-contributorsrc
.gitignore		.gitignore
01.A2C.ipynb		01.A2C.ipynb
02.PPO.ipynb		02.PPO.ipynb
03.DDPG.ipynb		03.DDPG.ipynb
04.TD3.ipynb		04.TD3.ipynb
05.SAC.ipynb		05.SAC.ipynb
06.DDPGfD.ipynb		06.DDPGfD.ipynb
07.BC.ipynb		07.BC.ipynb
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo.pkl		demo.pkl
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
segment_tree.py		segment_tree.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PG is all you need!

Contents

Environment

Pendulum-v0

Observation

Actions

Reward

Prerequisites

Installation

Development

Related Papers

Contributors

About

Releases

Packages

Contributors 5

Languages

License

MrSyee/pg-is-all-you-need

Folders and files

Latest commit

History

Repository files navigation

PG is all you need!

Contents

Environment

Pendulum-v0

Observation

Actions

Reward

Prerequisites

Installation

Development

Related Papers

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages