REINFORCEMENT LEARNING

This is my attempt at providing clean, "abstraction-free" implementations of various gradient based reinforcement learning algorithms. I have somewhat tried to adopt the "single-file" implementation strategy for each of the algorithms in order to make it easier for anyone who wants to read the code.

The code does not aim to be flexible for different parameter configurations or optimized for solving hard problems and running on multiple GPUs. It is rather a simplified single-process, single-file implementation exposing all the relevant details and removing all the confusing abstractions. Maybe it could be used as a reference if you want to roll out your own implementations.

Implementations of the following algorithms can be found here:

Vanilla policy gradient - code, docs
Advantage Actor-Critic - code docs
Proximal policy optimization - code, docs

If you want to read more about policy gradient algorithms, then checkout a blog post that I wrote.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REINFORCEMENT LEARNING

About

Releases

Packages

Languages

pi-tau/playing-with-RL-models

Folders and files

Latest commit

History

Repository files navigation

REINFORCEMENT LEARNING

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages