Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault error run_training.sh #7

Open
arccoxx opened this issue Nov 18, 2020 · 1 comment
Open

Segmentation fault error run_training.sh #7

arccoxx opened this issue Nov 18, 2020 · 1 comment

Comments

@arccoxx
Copy link

arccoxx commented Nov 18, 2020

I recently created an instance running all the suggested requirements. When running the default run_training script I received a "Segmentation fault error thrown by line 27 in run_training.sh which specifies the '--fp16' flag."

I then modified the setup to exclude apex and received this error:

run_training_2.sh: line 25: 1251 Segmentation fault (core dumped) python ../train_GeDi.py --task_name SST-2 --overwrite_output_dir --do_eval --do_train --logit_scale --data_dir ../data/AG-news --max_seq_length 192 --overwrite_cache --per_gpu_train_batch_size 4 --per_gpu_eval_batch_size 8 --learning_rate $lr --num_train_epochs 1.0 --output_dir ../topic_GeDi_retrained --model_type gpt2 --model_name_or_path gpt2-medium --gen_weight $lambda_ --logging_steps 500 --save_steps 5000000000 --code_0 false --code_1 true

Any thoughts on how to rectify this issue? Many thanks

Aidan

@akhileshgotmare
Copy link
Collaborator

@arccoxx I can't replicate this error. Which GPU are using if any?

There are some StackOverflow discussions on this - https://stackoverflow.com/questions/13654449/error-segmentation-fault-core-dumped (see 2nd most upvoted answer) which suggest that it might be a (CPU) RAM issue. Does reducing the --per_gpu_train_batch_size argument to 1 or 2 help?

There's also discussion in PyTorch issues which might be helpful: pytorch/pytorch#926.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants