This is the official repository for the paper MARIO Eval. We fix some bugs in the original latex2sympy, and add more antlr parser syntax to support more latex expressions.
Evaluation on MATH dataset
Model | Accuracy | Reported |
---|---|---|
MathCoder-CL-7B | 0.3064 | 0.3074 |
MathCoder-CL-34B | 0.4584 | 0.461 |
ToRA-Code-34B | 0.5136 | 0.51 |
ToRA-70B | 0.5014 | 0.497 |
DeepSeek-Math-Base-7B | 0.3318 | 0.3142 |
DeepSeek-Math-Instruct-7B | 0.572 | 0.575 |
DeepSeek-Math-RL-7B | 0.596 | 0.5878 |
- sympy based equivalence of two math expressions, see
is_equiv
- annotation of MATH testset with more robust evaluation, see
data/math_testset_annotation.json
anddemo.py
- integration of LLM
sympy=1.12
antlr4-python3-runtime==4.11.1
- NOT install gmpy2, i.e.,
pip uninstall gmpy2
> git clone https://github.com/MARIO-Math-Reasoning/MARIO_EVAL.git
> cd MARIO_EVAL
> python
>>> from latex2sympy.latex2sympy2 import latex2sympy
>>> latex2sympy("\\frac12")
1/2
>>> from math_evaluation import is_equiv
>>> is_equiv("1\\frac12", "1.5")
True
>>> is_equiv("\\begin{pmatrix} 1 & \\frac12 \\\\ 1/3 & \\sqrt4 \\end{pmatrix}",
... "[[1.0, 1/2],[0.3333, 2.0]]")
True
> git clone https://github.com/MARIO-Math-Reasoning/MARIO_EVAL.git
> cd MARIO_EVAL
> cd latex2sympy && pip install . && cd ..
> pip install -e .
python -m unittest math_evaluation/tests/test_is_equiv.py
Please cite our paper if you use data or code.
@misc{zhang2024mario,
title={MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit},
author={Boning Zhang and Chengxi Li and Kai Fan},
year={2024},
eprint={2404.13925},
archivePrefix={arXiv},
primaryClass={cs.CL}
}