GitHub - lehduong/Job-Scheduling-with-Reinforcement-Learning: Learning in Noisy MDP (which is governed by stochastic, exogenous input processes) with input-dependent baseline

Learning to Assign Credit in Input-driven Environment (LACIE) reduce the variance of estimation of advantages value in noisy MDP with hindsight distribution.

Input-driven MDP

Input-driven MDP are the Markov processes governed by not only agent's actions but also stochastic, exogenous input processes [1]. These environments have high variance inheritantly making it hard to learn optimal policy.

This repository implemented:

Input-dependence baseline as in proposed in [1].
Lacie - an algorithm that learn to weight the advantages of each rollout in hindsight with respect to future input sequences.

Install Dependencies

Install Pytorch

pip install torch torchvision

install Tensorflow 2

pip install tensorflow=2.2

or

pip install tensorflow-gpu=2.2

Install OpenAI baseline (Tensorflow 2 version)

git clone https://github.com/openai/baselines.git -b tf2 && \
cd baselines && \
pip install -e .

Note: I haven't tested the code on Tensorflow 1 yet but it should work as well.

Install Park Platform. I modified the platform slightly to make it compatible with OpenAI's baseline.

git clone https://github.com/lehduong/park &&\
cd park && \
pip install -e .

Run experiments

See scripts for examples.

Results:

Reward of A2C+Lacie (yellow) vs A2C (blue)

Value loss of A2C+Lacie (yellow) vs A2C (blue) during training:

Reference

[1] Variance Reduction for Reinforcement Learning in Input-Driven Environments.

Acknowledgement

The started code is based on ikostrikov's repository.

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
assets		assets
core		core
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
evaluation.py		evaluation.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Input-driven MDP

Install Dependencies

Run experiments

Results:

Reference

Acknowledgement

About

Releases

Packages

Languages

License

lehduong/Job-Scheduling-with-Reinforcement-Learning

Folders and files

Latest commit

History

Repository files navigation

Input-driven MDP

Install Dependencies

Run experiments

Results:

Reference

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages