Skip to content

Latest commit

 

History

History
8 lines (7 loc) · 519 Bytes

README.md

File metadata and controls

8 lines (7 loc) · 519 Bytes

RUDDER: Return Decomposition for Delayed Rewards

RUDDER efficiently learns optimal policies in finite Markov decision processes with delayed rewards. With the following links you can find: