OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Paper · Tutorial · Code · Docs · Data · Model · Issue · Demo

[ English ][ 中文 ]

Table of Contents 📖

News and Updates
Features
TODO
Benchmark
Plots
Datasets and Models
Getting Started
- Installation
- Quick Start
Usage
Join Us
Contact
Response Examples
Community
Reference

News and Updates

[29/11/2024] We have now added a demo page on ModelScope. Many thanks to @wangxingjun778 !
[24/10/2024] OpenR now supports MCTS reasoning (#24)! 🌲
[15/10/2024] Our report is on Arxiv!
[12/10/2024] OpenR has been released! 🚀

Features

Feature	Contents
✅ Process-supervision Data Generation	- OmegaPRM: Improve Mathematical Reasoning in Language Models by Automated Process Supervision
✅ Online Policy Training	- RL Training: APPO, GRPO, TPPO;
✅ Generative and Discriminative PRM Training	- PRM Training: Supervised Training for PRMs - Generative RM Training: Direct GenRM
✅ Multiple Search Strategies	- Greedy Search - Best-of-N - Beam Search - MCTS - rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers - Critic-MCTS: Under Review
✅ Test-time Computation and Scaling Law	TBA, see benchmark

TODO

Feature	TODO (High Priority, We value you contribution!)
👨‍💻Data	- Re-implement Journey Learning
👨‍💻RL Training	- Distributed Training - Reinforcement Fine-Tuning (RFT) #80
👨‍💻PRM	- Larger-scale training - GenRM-CoT implementation - Soft-label training #57
👨‍💻Reasoning	- Optimize code structure #53 - More tasks on reasoning (AIME, etc.) #53 - Multi-modal reasoning #82 - Reasoning in code generation #68 - Dots #75 - Consistency check - Benchmarking

Benchmark

See Benchmark !

Plots

Provided Datasets and Models

MATH-APS (Our Dataset)

MATH-psa (Our Process Reward Model)

Getting Started

Installation

conda create -n open_reasoner python=3.10
conda activate open_reasoner
pip install -r requirements.txt
pip3 install  "fschat[model_worker,webui]"
pip install -U pydantic
cd envs/MATH/latex2sympy
pip install -e .
cd -

Download Base Models

Before running the project, please ensure that all required base models are downloaded. The models used in this project include:

Qwen2.5-Math-1.5B-Instruct, Qwen2.5-Math-7B-Instruct
peiyi9979/mistral-7b-sft
peiyi9979/math-shepherd-mistral-7b-prm

To download these models, please refer to the Hugging Face model downloading tutorial for step-by-step guidance on downloading models from the Hugging Face Hub.

Please make sure that all models are saved in their directories according to the project setup before proceeding.

Quickstart

Before running inference, please modify the following variables in the scripts under reason/llm_service/ to set the appropriate base models for your usage:

$MODEL_BASE: Set this to the directory where your models are stored.
$POLICY_MODEL_NAME: Set this to the name of the policy model you wish to use.
$VALUE_MODEL_NAME: Set this to the name of the value model you wish to use.
$NUM_LM_WORKER: Set this to the number of language model (LM) workers to start.
$NUM_RM_WORKER: Set this to the number of reward model (RM) workers to start.

Then it prepares and runs inference using different techniques.

Start LM & RM Services

For example, to start the LM and RM services for the Math Shepherd model, run the following command:

sh reason/llm_service/create_service_math_shepherd.sh

To kill the server processes, recommend using the following command:

tmux kill-session -t {Your Session Name} # default is `FastChat`

Usage

Run Inference

⚠️ Make sure the input (--LM, --RM) in the script aligns with the variables ($POLICY_MODEL_NAME, $VALUE_MODEL_NAME) in the pending worker!

export PYTHONPATH=$(pwd)
sh scripts/eval/cot_greedy.sh

# Method: cot. Average result: ({'majority_vote': 0.734, 'total_completion_tokens': 559.13},)

sh scripts/eval/cot_rerank.sh

# Method: best_of_n. Average result: ({'majority_vote': 0.782, 
#                                       'prm_min_max': 0.772, 
#                                       'prm_min_vote': 0.792, 
#                                       'prm_last_max': 0.776, 
#                                       'prm_last_vote': 0.792, 
#                                       'total_completion_tokens': 4431.268},)

sh scripts/eval/beam_search.sh

# Method: beam_search. Average result: ({'majority_vote': 0.74, 'total_completion_tokens': 2350.492},)

sh scripts/eval/vanila_mcts.sh

Run Training

⚠️ Before training, please modify the $dataset_path, $model_name_or_path and $prm_name_or_path in train/mat/scripts/train_llm.sh.

cd train/mat/scripts
bash train_llm.sh

Run PRM Learning

cd prm/code

\\ single gpu
python finetune_qwen_single_gpu.py --model_path $YOUR_MODEL_PATH \
                                   --train_data_path $TRAIN_DATA_PATH \
                                   --test_data_path $TEST_DATA_PATH


\\ multi gpu
torchrun --nproc_per_node=2 finetune_qwen.py --model_path $YOUR_MODEL_PATH \
                                             --data_path $YOUR_DATA_FOLDER_PATH \
                                             --datasets both \

Join Us

Every contribution is valuable to the community.

Thank you for your interest in OpenR ! 🥰 We are deeply committed to the open-source community, and we welcome contributions from everyone. Your efforts, whether big or small, help us grow and improve. Contributions aren’t limited to code—answering questions, helping others, enhancing our documentation, and sharing the project are equally impactful.

Feel free to checkout the contribution guidance !

Future Plan

Add More Comprehensive Evaluations on RL Training and Search Strategies
Scaling the Prove-Verifier Model Size
Support Self-improvement Training

Contact

The OpenR community is maintained by:

Openreasoner Team (openreasoner@gmail.com)

License

OpenR is released under the MIT License.

Citation

If you do find our resources helpful, please cite our paper:

@misc{wang2024tutorial,
  author = {Jun Wang},
  title = {A Tutorial on LLM Reasoning: Relevant Methods Behind ChatGPT o1},
  year = {2024},
  url = {https://github.com/openreasoner/openr/blob/main/reports/tutorial.pdf},
  note = {Available on GitHub}
}

@article{wang2024openr,
  title={OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models},
  author={Wang, Jun and Fang, Meng and Wan, Ziyu and Wen, Muning and Zhu, Jiachen and Liu, Anjie and Gong, Ziqin and Song, Yan and Chen, Lei and Ni, Lionel M and others},
  journal={arXiv preprint arXiv:2410.09671},
  year={2024}
}

Response Examples

Comparing PRM, Math-psa (Ours) V.S. Math-Shepherd

Justifing RL Training

Exploring Test-time Computation

Community

WeChat:

Reference

Name		Name	Last commit message	Last commit date
Latest commit History 261 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
benchmark		benchmark
config		config
data		data
distributed		distributed
envs		envs
figure		figure
gen_rm		gen_rm
preprocess		preprocess
prm		prm
reason		reason
reports		reports
scripts/eval		scripts/eval
train		train
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

News and Updates

Features

TODO

Benchmark

Plots

Provided Datasets and Models

Getting Started

Installation

Download Base Models

Quickstart

Start LM & RM Services

Usage

Run Inference

Run Training

Run PRM Learning

Join Us

Future Plan

Contact

License

Citation

Response Examples

Comparing PRM, Math-psa (Ours) V.S. Math-Shepherd

Justifing RL Training

Exploring Test-time Computation

Community

Reference

Inference-time Computing

From Outcome Supervision to Process Supervision

Data Acquisition

About

Releases

Contributors 14

Languages

License

openreasoner/openr

Folders and files

Latest commit

History

Repository files navigation

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

News and Updates

Features

TODO

Benchmark

Plots

Provided Datasets and Models

Getting Started

Installation

Download Base Models

Quickstart

Start LM & RM Services

Usage

Run Inference

Run Training

Run PRM Learning

Join Us

Future Plan

Contact

License

Citation

Response Examples

Comparing PRM, Math-psa (Ours) V.S. Math-Shepherd

Justifing RL Training

Exploring Test-time Computation

Community

Reference

Inference-time Computing

From Outcome Supervision to Process Supervision

Data Acquisition

About

Resources

License

Stars

Watchers

Forks

Releases

Contributors 14

Languages