MARIO EVAL: A mathematical dataset evaluation toolkit

This is the official repository for the paper MARIO Eval. We fix some bugs in the original latex2sympy, and add more antlr parser syntax to support more latex expressions.

Evaluation on MATH dataset

Model	Accuracy	Reported
MathCoder-CL-7B	0.3064	0.3074
MathCoder-CL-34B	0.4584	0.461
ToRA-Code-34B	0.5136	0.51
ToRA-70B	0.5014	0.497
DeepSeek-Math-Base-7B	0.3318	0.3142
DeepSeek-Math-Instruct-7B	0.572	0.575
DeepSeek-Math-RL-7B	0.596	0.5878

Features

sympy based equivalence of two math expressions, see is_equiv
annotation of MATH testset with more robust evaluation, see data/math_testset_annotation.json and demo.py
integration of LLM

Requirements

sympy=1.12
antlr4-python3-runtime==4.11.1
NOT install gmpy2, i.e., pip uninstall gmpy2

Use without install

> git clone https://github.com/MARIO-Math-Reasoning/MARIO_EVAL.git
> cd MARIO_EVAL
> python
>>> from latex2sympy.latex2sympy2 import latex2sympy
>>> latex2sympy("\\frac12")
1/2
>>> from math_evaluation import is_equiv 
>>> is_equiv("1\\frac12", "1.5")
True
>>> is_equiv("\\begin{pmatrix} 1 & \\frac12 \\\\ 1/3 & \\sqrt4 \\end{pmatrix}", 
...          "[[1.0, 1/2],[0.3333, 2.0]]")
True

Install as Python package

> git clone https://github.com/MARIO-Math-Reasoning/MARIO_EVAL.git
> cd MARIO_EVAL
> cd latex2sympy && pip install . && cd ..
> pip install -e .

Unittest

python -m unittest math_evaluation/tests/test_is_equiv.py

Citation

Please cite our paper if you use data or code.

@misc{zhang2024mario,
      title={MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit}, 
      author={Boning Zhang and Chengxi Li and Kai Fan},
      year={2024},
      eprint={2404.13925},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
latex2sympy		latex2sympy
math_evaluation		math_evaluation
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
example_tora_eval.py		example_tora_eval.py
llm_type.py		llm_type.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARIO EVAL: A mathematical dataset evaluation toolkit

Evaluation on MATH dataset

Features

Requirements

Use without install

Install as Python package

Unittest

Citation

About

Releases 5

Packages

Languages

MARIO-Math-Reasoning/MARIO_EVAL

Folders and files

Latest commit

History

Repository files navigation

MARIO EVAL: A mathematical dataset evaluation toolkit

Evaluation on MATH dataset

Features

Requirements

Use without install

Install as Python package

Unittest

Citation

About

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages