🐋 Humback

An unofficial implementation of Self-Alignment with Instruction Backtranslation .

The proposed Humback is a novel framework that can augment the instruction data for supervised fine-tuning with high quality.

🚧 Currently, this repo is under construction and not finished.

🌴 Dependencies

Python==3.11.4
PyTorch==2.0.1
Others: requirements.txt

🚀 QuickStart

Procedure (2 iters):

Prepare seed data and unlabelled data.
Train the backward model $M_{yx}$ on the reversed seed data.
Self-augment the seed data via $M_{yx}$.
Train a forward model $M_{0}$ on the seed data.
Self-curate the unlabelled data $A_{k}^{(1)}$ via $M_{0}$ (tag quality scores).
Train a forward model $M_{1}$ on the self-curated unlabelled data $A_{k}^{(1)}$.
Use $M_{1}$ to self-curate the unlabelled data $A_{k}^{(2)}$.
Train a forward model $M_{2}$ on the self-curated unlabelled data $A_{k}^{(2)}$.

Seed Data Pre-processing

We follow the original paper and use oasst1 to construct the seed data.

The processed data could be found here .

$ bash data/seed/download.sh
$ python data/seed/convert.py
# #data: 3286, #dump: 3200
# Instruction len: 149±266, Response len: 1184±799

Unlabelled Data Pre-processing

Since ClueWeb22 is not a free open-source dataset, we sample texts from falcon-refinedweb instead.

The processed data could be found here .

$ python data/unlabelled/falcon_refinedweb.py

Train Backward Model $M_{yx}$

Item	Value
Foundation Model	meta-llama/Llama-2-7b-hf
GPUs	8 * A100 40GB
Mixed Precision	bf16
Gradient Checkpointing	on
ZeRO-Offload	Stage 2
Batch size	32
Steps	500

# The first Myx training takes about 30min (on the seed data).
$ bash scripts/train_backward_Myx.sh

The pre-trained $M_{yx}$ is available at Huggingface.

Self-Augmentation via $M_{yx}$

$ bash scripts/self_aug.sh

Train Seed Model $M_{0}$

Hyper parameters are the same as $M_{yx}$.

$ bash scripts/train_seed.sh

The pre-trained $M_{0}$ is available at Huggingface (Uploading).

Self-Curation Prompting

$ bash scripts/self_curation.sh

Train Models $M_{i}$

Most hyper parameters are the same as $M_{yx}$ except for the number of steps (the original Humback trains 1600 steps on 512k samples).

Item	Value
Steps	1400

# change the `--data_path` in `scripts/train_seed.sh`
$ bash scripts/train_seed.sh

📑 Experimental Results

Other models: HuggingFaceH4/open_llm_leaderboard .

Model	Average	ARC	HellaSwag	MMLU	TruthfulQA
Llama-2-7b	54.32	53.07	78.59	46.87	38.76
Llama-2-7b-chat	56.34	52.90	78.55	48.32	45.57
Vicuna-7b-v1.3	55.62	50.43	76.92	48.14	47.01
Humback $M_{0}$	58.13	56.31	81.20	47.45	47.59
Humback $M_{1}$
Humback $M_{2}$

💌 Acknowledgments

Paper: Self-Alignment with Instruction Backtranslation
Code: FastChat
Code: vLLM
Code: stanford_alpaca
Code: transformers

📜 Reference

@misc{li2023selfalignment,
    title={Self-Alignment with Instruction Backtranslation},
    author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
    year={2023},
    eprint={2308.06259},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
conf		conf
data		data
figs		figs
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐋 Humback

🌴 Dependencies

🚀 QuickStart

Seed Data Pre-processing

Unlabelled Data Pre-processing

Train Backward Model $M_{yx}$

Self-Augmentation via $M_{yx}$

Train Seed Model $M_{0}$

Self-Curation Prompting

Train Models $M_{i}$

📑 Experimental Results

💌 Acknowledgments

📜 Reference

About

Releases

Packages

Languages

License

zjrwtxdaydayup/Humback

Folders and files

Latest commit

History

Repository files navigation

🐋 Humback

🌴 Dependencies

🚀 QuickStart

Seed Data Pre-processing

Unlabelled Data Pre-processing

Train Backward Model $M_{yx}$

Self-Augmentation via $M_{yx}$

Train Seed Model $M_{0}$

Self-Curation Prompting

Train Models $M_{i}$

📑 Experimental Results

💌 Acknowledgments

📜 Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages