Skip to content

yuanyehome/FT-Data-Ranker-7B

Repository files navigation

FT-Data-Ranker:大语言模型微调数据竞赛--7B模型赛道

方案概述

  1. 搜索数据组成;
  2. 随机种子调优;

代码执行

环境配置

cd /workspace/data-juicer/
python tools/process_data.py --config configs/data_juicer_recipes/alpaca_cot/alpaca-cot-en-refine.yaml
python tools/process_data.py --config configs/data_juicer_recipes/alpaca_cot/alpaca-cot-zh-refine.yaml
  • 脚本定制修改:
cd /workspace/
git clone git@github.com:yuanyehome/FT-Data-Ranker-7B.git
cp FT-Data-Ranker-7B/deepspeed_train_7b_lora_custom.sh /workspace/lm-trianing/train_scripts/deepspeed_train_7b_lora.sh

最终采用的数据组成生成

python run.py --remove_zh_keys Alpaca-CoT/Chinese-medical/chinesemedical.json --remove_en_keys Alpaca-CoT/ConvAI2/persona_train_self_original.json --exp_name remove_zh_medical_en_convai2 --seeds 42 420 4200 20000123 0317 --gpus 0 1 2 3 4 5 6 7

Ablation Results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published