Skip to content

Latest commit

 

History

History
81 lines (59 loc) · 3.43 KB

README.md

File metadata and controls

81 lines (59 loc) · 3.43 KB

ChainLM

This repository contains the code, dataset, and models in our paper: ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting. We release:

Overview

CoTGenius is a Chain-of-Thought improvement framework to synthesize more complicated, diverse, and detailed CoT rationales. In this framework, we introduce three evolution strategies for improving CoT, i.e., complicate, diversify, and specify. Following CoTGenius, we generate a large-scale CoT dataset that contains 44335 samples covering commonsense reasoning, mathematical reasoning, scientific reasoning, and symbolic reasoning. Furthermore, we fine-tune open-source LLMs (i.e., Llama 2-Chat 7B and 13B) with our evolved CoT data, called ChainLM, and compare ChainLM to existing popular LLMs on 9 complex reasoning datasets. Finally, based on our ChainLM model, we propose a CoT reasoning strategy,step-level debating.

CoTGenius framework
The Overall Framework of CoTGenius

Data Release

The directory data contains 44k CoT samples generated after 4 rounds based on CoTGenius.

  • train_data.json is all the improved CoT data in the 4 rounds.
  • no_cs.json is the data after removing commonsense reasoning categories
  • no_math.json is the data after removing mathematical reasoning categories
  • no_sci.json is the data after removing scientific reasoning categories
  • no_sym.json is the data after removing symbolic reasoning categories
  • seed.json is the seed dataset used for generation.

Data Generation Process

Our data generation process is a combination of three pipelines.

  • Complicate: Firstly, we use complication strategy to complicate the questions of the origin data. Secondly, conduct evolutionary success judgement based on the complexity of the new questions. Then, generate answers to new questions. Finally, conduct correctness verification for new <question, CoT> samples.
  • Diversify: Similar to complication, but use diversification methods to guide question generation.
  • Specify: First rewrite the CoTs in the seed dataset and then conduct evolutionary success judgement.

To perform the generation process using CoTGenius, three scripts [complicate.sh, diversify.sh, specify.sh] are provided in generate.

cd generate
bash complicate.sh
bash diversify.sh
bash specify.sh

Fine-tune

We fine-tune Llama 2-Chat 7B and 13B models with our dataset. We call the CoT fine-tuning model ChainLM. The fine-tuning code is adopted from Alpaca.

cd fine-tune
bash run.sh

Evaluation

We conduct evaluation on 9 datasets independent of the seed dataset and present the performance.

cd evaluate
bash test.sh

main experimant

CoT Debating

Based on the MagicLM, we propose Step-level CoT Debating strategy. To evaluate with CoT debating:

cd debate
bash run.sh