Skip to content

Learning in Noisy MDP (which is governed by stochastic, exogenous input processes) with input-dependent baseline

License

Notifications You must be signed in to change notification settings

lehduong/Job-Scheduling-with-Reinforcement-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning to Assign Credit in Input-driven Environment (LACIE) reduce the variance of estimation of advantages value in noisy MDP with hindsight distribution.

Input-driven MDP

Input-driven MDP are the Markov processes governed by not only agent's actions but also stochastic, exogenous input processes [1]. These environments have high variance inheritantly making it hard to learn optimal policy.

This repository implemented:

  • Input-dependence baseline as in proposed in [1].

  • Lacie - an algorithm that learn to weight the advantages of each rollout in hindsight with respect to future input sequences.

Install Dependencies

  1. Install Pytorch
pip install torch torchvision
  1. install Tensorflow 2
pip install tensorflow=2.2

or

pip install tensorflow-gpu=2.2
  1. Install OpenAI baseline (Tensorflow 2 version)
git clone https://github.com/openai/baselines.git -b tf2 && \
cd baselines && \
pip install -e .

Note: I haven't tested the code on Tensorflow 1 yet but it should work as well.

  1. Install Park Platform. I modified the platform slightly to make it compatible with OpenAI's baseline.
git clone https://github.com/lehduong/park &&\
cd park && \
pip install -e .

Run experiments

See scripts for examples.

Results:

Reward of A2C+Lacie (yellow) vs A2C (blue) reward

Value loss of A2C+Lacie (yellow) vs A2C (blue) during training: train-value-loss

Reference

[1] Variance Reduction for Reinforcement Learning in Input-Driven Environments.

Acknowledgement

The started code is based on ikostrikov's repository.

About

Learning in Noisy MDP (which is governed by stochastic, exogenous input processes) with input-dependent baseline

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published