Skip to content

Commit

Permalink
feature(pu): add unizero citation and related info (#234)
Browse files Browse the repository at this point in the history
  • Loading branch information
puyuan1996 authored Jun 18, 2024
1 parent 14fc057 commit 61e8960
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 40 deletions.
52 changes: 31 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)
[![discord badge](https://dcbadge.vercel.app/api/server/dkZS2JF56X?style=flat)](https://discord.gg/dkZS2JF56X)

Updated on 2024.04.12 LightZero-v0.0.5
Updated on 2024.06.19 LightZero-v0.0.5

> LightZero is a lightweight, efficient, and easy-to-understand open-source algorithm toolkit that combines Monte Carlo Tree Search (MCTS) and Deep Reinforcement Learning (RL).
> For any questions about LightZero, you can consult the RAG-based Q&A assistant: [ZeroPal](https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal).
English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [LightZero Paper](https://arxiv.org/pdf/2310.08348.pdf) | [ReZero Paper](https://arxiv.org/abs/2404.16364)
English | [简体中文(Simplified Chinese)](https://github.com/opendilab/LightZero/blob/main/README.zh.md) | [LightZero Paper](https://arxiv.org/abs/2310.08348) | [🔥UniZero Paper](https://arxiv.org/abs/2406.10667) | [🔥ReZero Paper](https://arxiv.org/abs/2404.16364)

## Background

Expand Down Expand Up @@ -122,23 +122,23 @@ LightZero is a library with a [PyTorch](https://pytorch.org/) implementation of

The environments and algorithms currently supported by LightZero are shown in the table below:

| Env./Algo. | AlphaZero | MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero |
|---------------| -------- | ------ |-------------| ------------------ | ---------- |----------------|
| TicTacToe ||| 🔒 | 🔒 || 🔒 |
| Gomoku ||| 🔒 | 🔒 || 🔒 |
| Connect4 ||| 🔒 | 🔒 | 🔒 | 🔒 |
| 2048 | --- || 🔒 | 🔒 | 🔒 ||
| Chess | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |
| Go | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |
| CartPole | --- ||||||
| Pendulum | --- ||||||
| LunarLander | --- ||||||
| BipedalWalker | --- ||||| 🔒 |
| Atari | --- ||||||
| MuJoCo | --- |||| 🔒 | 🔒 |
| MiniGrid | --- |||| 🔒 | 🔒 |
| Bsuite | --- |||| 🔒 | 🔒 |
| Memory | --- |||| 🔒 | 🔒 |
| Env./Algo. | AlphaZero | MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero | UniZero |
|---------------| -------- | ------ |-------------| ------------------ | ---------- |----------------|---------------|
| TicTacToe ||| 🔒 | 🔒 || 🔒 ||
| Gomoku ||| 🔒 | 🔒 || 🔒 ||
| Connect4 ||| 🔒 | 🔒 | 🔒 | 🔒 ||
| 2048 | --- || 🔒 | 🔒 | 🔒 |||
| Chess | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |🔒|
| Go | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |🔒|
| CartPole | --- |||||||
| Pendulum | --- ||||||🔒|
| LunarLander | --- |||||||
| BipedalWalker | --- ||||| 🔒 |🔒|
| Atari | --- |||||||
| MuJoCo | --- |||| 🔒 | 🔒 |🔒|
| MiniGrid | --- |||| 🔒 | 🔒 ||
| Bsuite | --- |||| 🔒 | 🔒 ||
| Memory | --- |||| 🔒 | 🔒 ||

<sup>(1): "✔" means that the corresponding item is finished and well-tested.</sup>

Expand Down Expand Up @@ -296,6 +296,8 @@ The following are the detailed paper notes (in Chinese) of the above algorithms:
</details>
You can also refer to the relevant Zhihu column (in Chinese): [In-depth Analysis of MCTS+RL Frontier Theories and Applications](https://www.zhihu.com/column/c_1764308735227662336).
### Algo. Overview
The following are the overview MCTS principle diagrams of the above algorithms:
Expand Down Expand Up @@ -340,6 +342,7 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
- [2022 Online and Offline Reinforcement Learning by Planning with a Learned Model](https://arxiv.org/abs/2104.06294)
- [2021 Vector Quantized Models for Planning](https://arxiv.org/abs/2106.04615)
- [2021 Muesli: Combining Improvements in Policy Optimization. ](https://arxiv.org/abs/2104.06159)
#### MCTS Analysis
- [2020 Monte-Carlo Tree Search as Regularized Policy Optimization](https://arxiv.org/abs/2007.12509)
- [2021 Self-Consistent Models and Values](https://arxiv.org/abs/2110.12840)
Expand Down Expand Up @@ -487,12 +490,12 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
- ExpEnv: synthetic functions for nonlinear optimization, reinforcement learning problems in MuJoCo locomotion environments, and optimization problems in Neural Architecture Search (NAS).
- [Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization](https://openreview.net/pdf?id=SUzPos_pUC) 2022
- Lei Song∗ , Ke Xue∗ , Xiaobin Huang, Chao Qian
- Key: a low-dimensional subspace via MCTS, optimizes in the subspace with any Bayesian optimization algorithm.
- Key: a low-dimensional subspace via MCTS, optimizes in the subspace with any Bayesian optimization algorithm.
- ExpEnv: NAS-bench problems and MuJoCo locomotion
- [Monte Carlo Tree Search With Iteratively Refining State Abstractions](https://proceedings.neurips.cc/paper/2021/file/9b0ead00a217ea2c12e06a72eec4923f-Paper.pdf) 2021
- Samuel Sokota, Caleb Ho, Zaheen Ahmad, J. Zico Kolter
- Key: stochastic environments, Progressive widening, abstraction refining
- ExpEnv: Blackjack, Trap, five by five Go.
- ExpEnv: Blackjack, Trap, five by five Go.
- [Deep Synoptic Monte Carlo Planning in Reconnaissance Blind Chess](https://proceedings.neurips.cc/paper/2021/file/215a71a12769b056c3c32e7299f1c5ed-Paper.pdf) 2021
- Gregory Clark
- Key: imperfect information, belief state with an unweighted particle filter, a novel stochastic abstraction of information states.
Expand Down Expand Up @@ -541,6 +544,13 @@ Here is a collection of research papers about **Monte Carlo Tree Search**.
year={2024}
}
@article{pu2024unizero,
title={UniZero: Generalized and Efficient Planning with Scalable Latent World Models},
author={Pu, Yuan and Niu, Yazhe and Ren, Jiyuan and Yang, Zhenjie and Li, Hongsheng and Liu, Yu},
journal={arXiv preprint arXiv:2406.10667},
year={2024}
}
@article{xuan2024rezero,
title={ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze},
author={Xuan, Chunyu and Niu, Yazhe and Pu, Yuan and Hu, Shuai and Liu, Yu and Yang, Jing},
Expand Down
47 changes: 28 additions & 19 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@
[![Contributors](https://img.shields.io/github/contributors/opendilab/LightZero)](https://github.com/opendilab/LightZero/graphs/contributors)
[![GitHub license](https://img.shields.io/github/license/opendilab/LightZero)](https://github.com/opendilab/LightZero/blob/master/LICENSE)

最近更新于 2024.04.12 LightZero-v0.0.5
最近更新于 2024.06.19 LightZero-v0.0.5

> LightZero 是一个轻量、高效、易懂的 MCTS+RL 开源算法库。
> 有关 LightZero 的任何疑问,您都可以咨询基于 RAG 技术的问答助手:[ZeroPal](https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal)
[English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [LightZero 论文](https://arxiv.org/pdf/2310.08348.pdf) | [ReZero 论文](https://arxiv.org/abs/2404.16364)
[English](https://github.com/opendilab/LightZero/blob/main/README.md) | 简体中文 | [LightZero 论文](https://arxiv.org/abs/2310.08348) | [🔥UniZero 论文](https://arxiv.org/abs/2406.10667) | [🔥ReZero 论文](https://arxiv.org/abs/2404.16364)


## 背景
Expand Down Expand Up @@ -110,23 +110,23 @@ LightZero 是基于 [PyTorch](https://pytorch.org/) 实现的 MCTS 算法库,

LightZero 目前支持的环境及算法如下表所示:

| Env./Algo. | AlphaZero | MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero |
|---------------| -------- | ------ |-------------| ------------------ | ---------- |----------------|
| TicTacToe ||| 🔒 | 🔒 || 🔒 |
| Gomoku ||| 🔒 | 🔒 || 🔒 |
| Connect4 ||| 🔒 | 🔒 | 🔒 | 🔒 |
| 2048 | --- || 🔒 | 🔒 | 🔒 ||
| Chess | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |
| Go | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |
| CartPole | --- ||||||
| Pendulum | --- ||||||
| LunarLander | --- ||||||
| BipedalWalker | --- ||||| 🔒 |
| Atari | --- ||||||
| MuJoCo | --- |||| 🔒 | 🔒 |
| MiniGrid | --- |||| 🔒 | 🔒 |
| Bsuite | --- |||| 🔒 | 🔒 |
| Memory | --- |||| 🔒 | 🔒 |
| Env./Algo. | AlphaZero | MuZero | EfficientZero | Sampled EfficientZero | Gumbel MuZero | Stochastic MuZero | UniZero |
|---------------| -------- | ------ |-------------| ------------------ | ---------- |----------------|---------------|
| TicTacToe ||| 🔒 | 🔒 || 🔒 ||
| Gomoku ||| 🔒 | 🔒 || 🔒 ||
| Connect4 ||| 🔒 | 🔒 | 🔒 | 🔒 ||
| 2048 | --- || 🔒 | 🔒 | 🔒 |||
| Chess | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |🔒|
| Go | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 | 🔒 |🔒|
| CartPole | --- |||||||
| Pendulum | --- ||||||🔒|
| LunarLander | --- |||||||
| BipedalWalker | --- ||||| 🔒 |🔒|
| Atari | --- |||||||
| MuJoCo | --- |||| 🔒 | 🔒 |🔒|
| MiniGrid | --- |||| 🔒 | 🔒 ||
| Bsuite | --- |||| 🔒 | 🔒 ||
| Memory | --- |||| 🔒 | 🔒 ||

<sup>(1): "✔" 表示对应的项目已经完成并经过良好的测试。</sup>

Expand Down Expand Up @@ -284,6 +284,8 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py

</details>

也可参考相应的知乎专栏: [MCTS+RL 前沿理论和应用的深入解析](https://www.zhihu.com/column/c_1764308735227662336)。

### 算法框架图

以下是 LightZero 中集成算法的框架概览图:
Expand Down Expand Up @@ -536,6 +538,13 @@ and internal state transition dynamics,
year={2024}
}
@article{pu2024unizero,
title={UniZero: Generalized and Efficient Planning with Scalable Latent World Models},
author={Pu, Yuan and Niu, Yazhe and Ren, Jiyuan and Yang, Zhenjie and Li, Hongsheng and Liu, Yu},
journal={arXiv preprint arXiv:2406.10667},
year={2024}
}
@article{xuan2024rezero,
title={ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze},
author={Xuan, Chunyu and Niu, Yazhe and Pu, Yuan and Hu, Shuai and Liu, Yu and Yang, Jing},
Expand Down

0 comments on commit 61e8960

Please sign in to comment.