RUDDER: Return Decomposition for Delayed Rewards

RUDDER efficiently learns optimal policies in finite Markov decision processes with delayed rewards. With the following links you can find:

Our RUDDER paper: https://arxiv.org/abs/1806.07857
RUDDER blog: https://ml-jku.github.io/rudder/
Code for RUDDER demonstration on example-task in blog: https://github.com/ml-jku/rudder-demonstration-code
A practical step-by-step guide to applying RUDDER in PyTorch: https://github.com/widmi/rudder-a-practical-tutorial

Provide feedback