Skip to content

πŸ‹ An unofficial implementation of Self-Alignment with Instruction Backtranslation.

License

Notifications You must be signed in to change notification settings

zjrwtxdaydayup/Humback

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‹ Humback

An unofficial implementation of Self-Alignment with Instruction Backtranslation .

The proposed Humback is a novel framework that can augment the instruction data for supervised fine-tuning with high quality.

🚧 Currently, this repo is under construction and not finished.

Humback Framework

🌴 Dependencies

πŸš€ QuickStart

Procedure (2 iters):

  1. Prepare seed data and unlabelled data.
  2. Train the backward model $M_{yx}$ on the reversed seed data.
  3. Self-augment the seed data via $M_{yx}$.
  4. Train a forward model $M_{0}$ on the seed data.
  5. Self-curate the unlabelled data $A_{k}^{(1)}$ via $M_{0}$ (tag quality scores).
  6. Train a forward model $M_{1}$ on the self-curated unlabelled data $A_{k}^{(1)}$.
  7. Use $M_{1}$ to self-curate the unlabelled data $A_{k}^{(2)}$.
  8. Train a forward model $M_{2}$ on the self-curated unlabelled data $A_{k}^{(2)}$.

Seed Data Pre-processing

We follow the original paper and use oasst1 to construct the seed data.

The processed data could be found here .

$ bash data/seed/download.sh
$ python data/seed/convert.py
# #data: 3286, #dump: 3200
# Instruction len: 149Β±266, Response len: 1184Β±799

Unlabelled Data Pre-processing

Since ClueWeb22 is not a free open-source dataset, we sample texts from falcon-refinedweb instead.

The processed data could be found here .

$ python data/unlabelled/falcon_refinedweb.py

Train Backward Model $M_{yx}$

Item Value
Foundation Model meta-llama/Llama-2-7b-hf
GPUs 8 * A100 40GB
Mixed Precision bf16
Gradient Checkpointing on
ZeRO-Offload Stage 2
Batch size 32
Steps 500
# The first Myx training takes about 30min (on the seed data).
$ bash scripts/train_backward_Myx.sh

The pre-trained $M_{yx}$ is available at Huggingface.

Self-Augmentation via $M_{yx}$

$ bash scripts/self_aug.sh

Train Seed Model $M_{0}$

Hyper parameters are the same as $M_{yx}$.

$ bash scripts/train_seed.sh

The pre-trained $M_{0}$ is available at Huggingface (Uploading).

Self-Curation Prompting

$ bash scripts/self_curation.sh

Train Models $M_{i}$

Most hyper parameters are the same as $M_{yx}$ except for the number of steps (the original Humback trains 1600 steps on 512k samples).

Item Value
Steps 1400
# change the `--data_path` in `scripts/train_seed.sh`
$ bash scripts/train_seed.sh

πŸ“‘ Experimental Results

Other models: HuggingFaceH4/open_llm_leaderboard .

Model Average ARC HellaSwag MMLU TruthfulQA
Llama-2-7b 54.32 53.07 78.59 46.87 38.76
Llama-2-7b-chat 56.34 52.90 78.55 48.32 45.57
Vicuna-7b-v1.3 55.62 50.43 76.92 48.14 47.01
Humback $M_{0}$ 58.13 56.31 81.20 47.45 47.59
Humback $M_{1}$
Humback $M_{2}$

πŸ’Œ Acknowledgments

πŸ“œ Reference

@misc{li2023selfalignment,
    title={Self-Alignment with Instruction Backtranslation},
    author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
    year={2023},
    eprint={2308.06259},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

About

πŸ‹ An unofficial implementation of Self-Alignment with Instruction Backtranslation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 75.1%
  • Shell 23.6%
  • Makefile 1.3%