Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单机多卡预训练ChatGLM报错: #4

Closed
3 tasks
zzzhaoguziji opened this issue Jun 9, 2023 · 8 comments
Closed
3 tasks

单机多卡预训练ChatGLM报错: #4

zzzhaoguziji opened this issue Jun 9, 2023 · 8 comments
Labels
question Further information is requested

Comments

@zzzhaoguziji
Copy link

Describe the Question

Please provide a clear and concise description of what the question is.
单卡训练可以,单机多卡不形
训练命令为:
CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 1 pretraining.py
--model_type chatglm
--model_name_or_path ../chatglm
--train_file_dir ../data/pretrain
--validation_file_dir ../data/pretrain
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--do_train
--do_eval
--use_peft True
--seed 42
--fp16
--num_train_epochs 0.5
--learning_rate 2e-4
--warmup_ratio 0.05
--weight_decay 0.01
--logging_strategy steps
--logging_steps 10
--eval_steps 50
--evaluation_strategy steps
--save_steps 500
--save_strategy steps
--save_total_limit 3
--gradient_accumulation_steps 1
--preprocessing_num_workers 1
--block_size 1024
--output_dir outputs-pt-v1
--overwrite_output_dir
--ddp_timeout 30000
--logging_first_step True
--target_modules all
--lora_rank 8
--lora_alpha 16
--lora_dropout 0.05
--torch_dtype float16
--device_map auto
--report_to tensorboard
--ddp_find_unused_parameters False
--gradient_checkpointing True
--deepspeed deepspeed_config.json

Describe your attempts

  • I walked through the tutorials
  • I checked the documentation
  • I checked to make sure that this is not a duplicate question
    微信截图_20230609104145
@zzzhaoguziji zzzhaoguziji added the question Further information is requested label Jun 9, 2023
@shibing624
Copy link
Owner

参数设置需要为:CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 2 pretraining.py torchrun模式下,是每张卡加载全部模型参数,数据并行训练,如果显存不足,可以开启cpu_offload

@shibing624
Copy link
Owner

还可以这样:CUDA_VISIBLE_DEVICES=0,1 python pretraining.py 使用device_map="auto"可以自动分配多个卡加载模型。

@zzzhaoguziji
Copy link
Author

谢谢大佬,我再试试

@boxter007
Copy link

还可以这样:CUDA_VISIBLE_DEVICES=0,1 python pretraining.py 使用device_map="auto"可以自动分配多个卡加载模型。

针对glm6b2我试过了,还是不行。报同样的错误。但是glm6b就能用。

@Alfer-Feng
Copy link

glm和glm2模型参数是不一样的,转换时要修改,我现在也在入手这个,有结果后再来评论,插个眼

@archerbj
Copy link

参数设置需要为:CUDA_VISIBLE_DEVICES=0,1 torchrun --nnodes 1 --nproc_per_node 2 pretraining.py torchrun模式下,是每张卡加载全部模型参数,数据并行训练,如果显存不足,可以开启cpu_offload

请问下 cpu_offload 怎么开启?

@chloefresh
Copy link

请问最后解决了吗?怎么解决的可以分享一下吗 @zzzhaoguziji @boxter007

@chloefresh
Copy link

glm和glm2模型参数是不一样的,转换时要修改,我现在也在入手这个,有结果后再来评论,插个眼

有结果了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants