Skip to content

Commit

Permalink
second
Browse files Browse the repository at this point in the history
  • Loading branch information
zzh-SJTU committed Jun 13, 2024
1 parent a630969 commit c23bd70
Showing 1 changed file with 11 additions and 6 deletions.
17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ These variables ensure that the Langchain library can properly authenticate with


### 📊 Original Benchmarks
We use the following 4 widely-used benchmarks
We use the following 4 widely-used benchmarks on 4 diverse reasoning domains

- **[GSM8K](https://github.com/openai/grade-school-math)**: One of the most widely-used math reasoning dataset which contains high-quality, linguistically diverse school math word problems.
- **[Bias Benchmark for QA (BBQ)](https://huggingface.co/datasets/heegyu/bbq)**:A widelyused dataset that highlight attested social biases against people belonging to protected classes along nine social dimensions.
- **[BIG-Bench Hard (BBH) Navigate](https://huggingface.co/datasets/lukaemon/bbh/viewer/navigate)**: A spatial reasoning dataset which involves giving the LLM navigation steps to determine if the agent returns to the starting point.
- **[BIG-Bench Hard (BBH) Dyck Language](https://huggingface.co/datasets/lukaemon/bbh/viewer/dyck_languages)**: A symbolic reasoning dataset which requires the model to predict the sequence of closing parentheses for a Dyck-4 word missing its last few closing parentheses.
- **Math Reasoning-[GSM8K](https://github.com/openai/grade-school-math)**: One of the most widely-used math reasoning dataset which contains high-quality, linguistically diverse school math word problems.
- **Social Reasoning-[Bias Benchmark for QA (BBQ)](https://huggingface.co/datasets/heegyu/bbq)**:A widely-used dataset that highlight attested social biases against people belonging to protected classes along nine social dimensions.
- **Spatial Reasoning-[BIG-Bench Hard (BBH) Navigate](https://huggingface.co/datasets/lukaemon/bbh/viewer/navigate)**: A spatial reasoning dataset which involves giving the LLM navigation steps to determine if the agent returns to the starting point.
- **Symbolic Reasoning-[BIG-Bench Hard (BBH) Dyck Language](https://huggingface.co/datasets/lukaemon/bbh/viewer/dyck_languages)**: A symbolic reasoning dataset which requires the model to predict the sequence of closing parentheses for a Dyck-4 word missing its last few closing parentheses.


## 📁 Directory Structure
Expand Down Expand Up @@ -84,5 +84,10 @@ For inquiries regarding the framework or the paper, please contact:
If you find our work useful or it contributes to your research, please cite our paper:

```bibtex
@inproceedings{TODO}
@article{zhang2024darg,
title={DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph},
author={Zhang, Zhehao and Chen, Jiaao and Yang, Diyi},
journal={arXiv preprint arXiv:XXX},
year={2024}
}
```

0 comments on commit c23bd70

Please sign in to comment.