Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7B的模型在运行run_pt.py之后生成的pytorch_model.bin有13G #379

Closed
huangxd- opened this issue May 18, 2023 · 9 comments
Closed

Comments

@huangxd-
Copy link

是哪里配置得不对吗?

-rw-r--r--. 1 root root  191 5月  18 16:46 all_results.json
drwxr-xr-x. 3 root root  245 5月  18 16:45 checkpoint-100
drwxr-xr-x. 3 root root  244 5月  18 16:38 checkpoint-50
-rw-r--r--. 1 root root  13G 5月  18 16:46 pytorch_model.bin
-rw-r--r--. 1 root root   96 5月  18 16:45 special_tokens_map.json
-rw-r--r--. 1 root root  747 5月  18 16:45 tokenizer_config.json
-rw-r--r--. 1 root root 741K 5月  18 16:45 tokenizer.model
-rw-r--r--. 1 root root 1.9K 5月  18 16:46 trainer_state.json
-rw-r--r--. 1 root root 4.9K 5月  18 16:45 training_args.bin
-rw-r--r--. 1 root root  191 5月  18 16:46 train_results.json

run_pt.sh

lr=2e-4
lora_rank=8
lora_alpha=32
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=/home/hxd/vicuna/llama-7b-chinese-alpaca-plus
chinese_tokenizer_path=/home/hxd/vicuna/llama-7b-chinese-alpaca-plus
dataset_dir=/home/hxd/vicuna/Chinese-LLaMA-Alpaca/my_data
data_cache=/home/hxd/vicuna/Chinese-LLaMA-Alpaca/my_data/data_tmp
per_device_train_batch_size=1
per_device_eval_batch_size=1
training_steps=100
gradient_accumulation_steps=1
output_dir=output_dir

#deepspeed_config_file=ds_zero2_no_offload.json
deepspeed_config_file=ds_zero3_offload.json

torchrun --nnodes 1 --nproc_per_node 1 run_clm_pt_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --data_cache_dir ${data_cache} \
    --validation_split_percentage 0.001 \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --per_device_eval_batch_size ${per_device_eval_batch_size} \
    --do_train \
    --seed $RANDOM \
    --max_steps ${training_steps} \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.05 \
    --weight_decay 0.01 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --save_steps 50 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 1 \
    --block_size 128 \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --torch_dtype float16 \
    --ddp_find_unused_parameters False \
    --fp16
#    --gradient_checkpointing
#    --modules_to_save ${modules_to_save}

ds_zero3_offload.json

{
    "fp16": {
       "enabled": true,
       "auto_cast": false,
       "loss_scale": 0,
       "initial_scale_power": 16,
       "loss_scale_window": 1000,
       "hysteresis": 2,
       "min_loss_scale": 1
    },
    "zero_optimization": {
       "stage": 3,
       "offload_optimizer": {
          "device": "cpu",
          "pin_memory": true
       },
       "offload_param": {
          "device": "cpu",
          "pin_memory": true
       },
       "overlap_comm": true,
       "contiguous_gradients": true,
       "reduce_bucket_size": 205520896,
       "stage3_prefetch_bucket_size": 184968807,
       "stage3_param_persistence_threshold": 143360,
       "sub_group_size": 1e9,
       "stage3_max_live_parameters": 1e9,
       "stage3_max_reuse_distance": 1e9,
       "stage3_gather_16bit_weights_on_model_save": true
    },
    "steps_per_print": 100,
    "train_batch_size": 1,
    "train_micro_batch_size_per_gpu": 1,
    "gradient_accumulation_steps": 1,
    "wall_clock_breakdown": false,
    "optimizer": {
      "type": "Adam",
      "params": {
        "adam_w_mode": true,
        "lr": 2e-4,
        "betas": [
          0.9,
          0.999
        ],
        "eps": 1e-8,
        "weight_decay": 0.01
      }
    }
 }
@airaria
Copy link
Contributor

airaria commented May 18, 2023

麻烦看下checkpoint-100下pytorch_model.bin是多大?

@huangxd-
Copy link
Author

麻烦看下checkpoint-100下pytorch_model.bin是多大?

当时确认过,也是13G的

@airaria
Copy link
Contributor

airaria commented May 18, 2023

用的是DS ZeRO-3?不知道会不会和这个策略有关。
Load一下pytorch_model.bin,print一下键值,看看有LoRA权重吗?

@huangxd-
Copy link
Author

用的是DS ZeRO-3?不知道会不会和这个策略有关。
Load一下pytorch_model.bin,print一下键值,看看有LoRA权重吗?

FBCA7B73-61F1-4FC6-B720-9E0581EAC7AB

这应该算是有的吧?

@airaria
Copy link
Contributor

airaria commented May 19, 2023

保存了全量而不是LoRA可能和ZeRO 3 有关系;
那可能需要你自己从全量权重中把LoRA拆出来了,或者直接用加了LoRA的模型加载这份权重也没问题

@huangxd-
Copy link
Author

huangxd- commented May 22, 2023

不确定是不是因为peft版本的原因

但根据
huggingface/peft#286 (comment)
tloen/alpaca-lora#359

class SavePeftModelCallback(TrainerCallback):
    def on_save(
        self,
        args: TrainingArguments,
        state: TrainerState,
        control: TrainerControl,
        **kwargs,
    ):
        checkpoint_folder = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")

        peft_model_path = os.path.join(checkpoint_folder, "adapter_model")
        kwargs["model"].save_pretrained(peft_model_path)

        pytorch_model_path = os.path.join(checkpoint_folder, "pytorch_model.bin")
        if os.path.exists(pytorch_model_path):
            os.remove(pytorch_model_path)
        return control

model.save_pretrained()之后确实能只保存lora参数

@kylin-zhou
Copy link

保存了全量而不是LoRA可能和ZeRO 3 有关系; 那可能需要你自己从全量权重中把LoRA拆出来了,或者直接用加了LoRA的模型加载这份权重也没问题

我也遇到了同样的问题,用什么方法可以把lora权重拆出来?

@huangxd-
Copy link
Author

huangxd- commented Jun 8, 2023

保存了全量而不是LoRA可能和ZeRO 3 有关系; 那可能需要你自己从全量权重中把LoRA拆出来了,或者直接用加了LoRA的模型加载这份权重也没问题

我也遇到了同样的问题,用什么方法可以把lora权重拆出来?

用model.save_pretrained("tmp_output")替换trainer.save_model()

另一种方法是可以改保存模型的回调函数,可以参考
huggingface/peft#286 (comment)
tloen/alpaca-lora#359

@kylin-zhou
Copy link

从全量权重中把LoRA拆出来了,或者直接用加了LoRA的模型加载这份权重

多谢,我试试。我的模型的trainer的checkpoint生成的,估计需要用peft重新加载,然后save_pretrained

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants