An unofficial implementation of Self-Alignment with Instruction Backtranslation .
The proposed Humback is a novel framework that can augment the instruction data for supervised fine-tuning with high quality.
π§ Currently, this repo is under construction and not finished.
- Python==3.11.4
- PyTorch==2.0.1
- Others: requirements.txt
Procedure (2 iters):
- Prepare seed data and unlabelled data.
- Train the backward model
$M_{yx}$ on the reversed seed data. - Self-augment the seed data via
$M_{yx}$ . - Train a forward model
$M_{0}$ on the seed data. - Self-curate the unlabelled data
$A_{k}^{(1)}$ via$M_{0}$ (tag quality scores). - Train a forward model
$M_{1}$ on the self-curated unlabelled data$A_{k}^{(1)}$ . - Use
$M_{1}$ to self-curate the unlabelled data$A_{k}^{(2)}$ . - Train a forward model
$M_{2}$ on the self-curated unlabelled data$A_{k}^{(2)}$ .
We follow the original paper and use oasst1 to construct the seed data.
The processed data could be found here .
$ bash data/seed/download.sh
$ python data/seed/convert.py
# #data: 3286, #dump: 3200
# Instruction len: 149Β±266, Response len: 1184Β±799
Since ClueWeb22 is not a free open-source dataset, we sample texts from falcon-refinedweb instead.
The processed data could be found here .
$ python data/unlabelled/falcon_refinedweb.py
Item | Value |
---|---|
Foundation Model | meta-llama/Llama-2-7b-hf |
GPUs | 8 * A100 40GB |
Mixed Precision | bf16 |
Gradient Checkpointing | on |
ZeRO-Offload | Stage 2 |
Batch size | 32 |
Steps | 500 |
# The first Myx training takes about 30min (on the seed data).
$ bash scripts/train_backward_Myx.sh
The pre-trained
$ bash scripts/self_aug.sh
Hyper parameters are the same as
$ bash scripts/train_seed.sh
The pre-trained
$ bash scripts/self_curation.sh
Most hyper parameters are the same as
Item | Value |
---|---|
Steps | 1400 |
# change the `--data_path` in `scripts/train_seed.sh`
$ bash scripts/train_seed.sh
Other models: HuggingFaceH4/open_llm_leaderboard .
Model | Average | ARC | HellaSwag | MMLU | TruthfulQA |
---|---|---|---|---|---|
Llama-2-7b | 54.32 | 53.07 | 78.59 | 46.87 | 38.76 |
Llama-2-7b-chat | 56.34 | 52.90 | 78.55 | 48.32 | 45.57 |
Vicuna-7b-v1.3 | 55.62 | 50.43 | 76.92 | 48.14 | 47.01 |
Humback |
58.13 | 56.31 | 81.20 | 47.45 | 47.59 |
Humback |
|||||
Humback |
- Paper: Self-Alignment with Instruction Backtranslation
- Code: FastChat
- Code: vLLM
- Code: stanford_alpaca
- Code: transformers
@misc{li2023selfalignment,
title={Self-Alignment with Instruction Backtranslation},
author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
year={2023},
eprint={2308.06259},
archivePrefix={arXiv},
primaryClass={cs.CL}
}