Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay

Overview

PyTorch implementation of Conservative Estimation with Experience Replay (CEER).
Method is tested on Sokoban, Minigrid and MinAtar environments.

Installation

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

My Python version is 3.7.11. CUDA version is 11.4.

Running Experiments

python main.py

Modify atari_name_list in ceer/arguments.py for different environments.
For example, 'atari_name_list': ['Sokoban-Push_5x5_1_120'].
Other parameters like sample_method_para # alpha,policy_loss_para # lambda are also in ceer/arguments.py.

Bibtex

@inproceedings{
zhang2023replay,
title={Replay Memory as An Empirical {MDP}: Combining Conservative Estimation with Experience Replay},
author={Hongming Zhang and Chenjun Xiao and Han Wang and Jun Jin and Bo Xu and Martin M{\"u}ller},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=SjzFVSJUt8S}
}

Acknowledgements

Awesome Environments used for testing:

Sokoban: https://github.com/mpSchrader/gym-sokoban

Minigrid: https://github.com/Farama-Foundation/Minigrid

MinAtar: https://github.com/kenjyoung/MinAtar
Some baselines can be found in following works:

TER: https://openreview.net/forum?id=OXRZeMmOI7a

Dreamerv2: https://github.com/RajGhugare19/dreamerv2

Tianshou: https://github.com/thu-ml/tianshou

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay

Overview

Installation

Running Experiments

Bibtex

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Replay Memory as An Empirical MDP: Combining Conservative Estimation with Experience Replay

Overview

Installation

Running Experiments

Bibtex

Acknowledgements