Yuanlin Duan, Wensen Mao and He Zhu
Code for "Learning World Models for Unconstrained Goal Navigation" (NeurIPS 2024), a method to improve the quality of world model in MBRL.
If you find our paper or code useful, please reference us:
@article{duan2024learning,
title={Learning World Models for Unconstrained Goal Navigation},
author={Duan, Yuanlin and Mao, Wensen and Zhu, He},
journal={arXiv preprint arXiv:2411.02446},
year={2024}
}
The richness of environment space and dynamics captured by the replay buffer sets the upper limit for what the world model can learn about the real world. It also significantly influences the training level of the policy.
MUN learns a world model from state transitions between any states in the replay buffer (whether tracing back along recorded trajectories or transitioning across separate trajectories).
Previous Replay Buffer: One-way direction
Our Method (MUN)’s Replay Buffer: Two-way direction
During the trajectory evolution towards the agent's goal, there often exist certain states termed as key subgoal states.
We observed that key subgoal states typically correspond to actions with significant differences. So we designed DAD alogorithm to find key subgoal states.
Some key subgoals found by DAD:
MUN trains better policies in different tasks compared with other baselines:
Success rate of MUN crossing different key subgoal pairs:
MUN/
|- Config/ # config file for each environment.
|- dreamerv2_APS/ # MUN implement
|- dreamerv2_APS/gc_main.py # Main running file
pip intall all dependencies:
pip install -r library.txt
And then, run:
pip install -e .
We evaluate MUN on six environments: Ant Maze, Walker, 3-block Stacking, Block Rotation, Pen Rotation, Fetch Slide.
MUJOCO install: MuJoCo 2.0
Ant Maze, 3-Block Stack environments:
The mrl
codebase contains Ant Maze and 3-block Stack environments.
git clone https://github.com/hueds/mrl.git
Before testing these two environments, you should make sure that the mrl
path is set in the PYTHONPATH
.
# if you want to run environments in the mrl codebase(Ant Maze, 3-block Stacking)
export PYTHONPATH=<path to your mrl folder>
Walker environment:
Clone the lexa-benchmark
and dm_control
repos.
git clone https://github.com/hueds/dm_control
git clone https://github.com/hueds/lexa-benchmark.git
Set up dm_control
as a local python module:
cd dm_control
pip install .
Set LD_PRELOAD
to the libGLEW path, and set the MUJOCO_GL
and MUJOCO_RENDERER
variables.
# if you want to run environments in the lexa-benchmark codebase
MUJOCO_GL=egl MUJOCO_RENDERER=egl LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/x86_64-linux-gnu/libGL.so PYTHONPATH=<path to your lexa-benchmark folder like "/home/edward/lexa-benchmark">
Training Scripts:
python dreamerv2_APS/gc_main.py --configs RotatePen(environment name in config file) --logdir "your logdir path"
Use the tensorboard to check the results.
tensorboard --logdir ~/logdir/your_logdir_name
MUN builds on many prior works, and we thank the authors for their contributions.