You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check that this issue hasn't been reported before.
I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
execute finetune.py with examples/llama-2/gptq-lora.yml.
Execution does not throw any error and the model trains fine.
Current behaviour
Execution throws error after a while. It does not start the trainer whatsoever.
Error thrown:
[2023-09-18 11:25:48,695] [INFO] [axolotl.train.train:57] [PID:6348] [RANK:0] loading model and (optionally) peft_config... [2023-09-18 11:28:06,557] [ERROR] [axolotl.load_model:321] [PID:6348] [RANK:0] Exception raised attempting to load model, retrying with AutoModelForCausalLM [2023-09-18 11:28:06,557] [ERROR] [axolotl.load_model:324] [PID:6348] [RANK:0] Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object Traceback (most recent call last): File "/root/axolotl/src/axolotl/utils/models.py", line 272, in load_model model = AutoModelForCausalLM.from_pretrained( File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
model = quantizer.post_init_model(model)
File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object
Traceback (most recent call last):
File "/root/axolotl/src/axolotl/utils/models.py", line 272, in load_model
model = AutoModelForCausalLM.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
model = quantizer.post_init_model(model)
File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/axolotl/scripts/finetune.py", line 52, in <module>
fire.Fire(do_cli)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/axolotl/scripts/finetune.py", line 48, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/root/axolotl/src/axolotl/train.py", line 58, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/root/axolotl/src/axolotl/utils/models.py", line 325, in load_model
model = AutoModelForCausalLM.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3287, in from_pretrained
model = quantizer.post_init_model(model)
File "/opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py", line 482, in post_init_model
raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'scripts/finetune.py', 'examples/llama-2/gptq-lora.yml']' returned non-zero exit status 1.
As suggested by @NanoCode012 I changed the config.json of the model in order to add "disable_exllama": true in the quantization_config section. This thows a different error:
Traceback (most recent call last):
File "/root/axolotl/scripts/finetune.py", line 52, in <module>
fire.Fire(do_cli)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/root/axolotl/scripts/finetune.py", line 48, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/root/axolotl/src/axolotl/train.py", line 58, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/root/axolotl/src/axolotl/utils/models.py", line 420, in load_model
log_gpu_memory_usage(LOG, "after adapters", model.device)
File "/root/axolotl/src/axolotl/utils/bench.py", line 37, in log_gpu_memory_usage
usage, cache, misc = gpu_memory_usage_all(device)
File "/root/axolotl/src/axolotl/utils/bench.py", line 13, in gpu_memory_usage_all
usage = torch.cuda.memory_allocated(device) / 1024.0**3
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 351, in memory_allocated
return memory_stats(device=device).get("allocated_bytes.all.current", 0)
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 230, in memory_stats
stats = memory_stats_as_nested_dict(device=device)
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/memory.py", line 241, in memory_stats_as_nested_dict
device = _get_device_index(device, optional=True)
File "/opt/conda/lib/python3.10/site-packages/torch/cuda/_utils.py", line 32, in _get_device_index
raise ValueError('Expected a cuda device, but got: {}'.format(device))
ValueError: Expected a cuda device, but got: cpu
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'scripts/finetune.py', 'examples/llama-2/gptq-lora.yml']' returned non-zero exit status 1.
Please check that this issue hasn't been reported before.
Expected Behavior
execute
finetune.py
withexamples/llama-2/gptq-lora.yml
.Execution does not throw any error and the model trains fine.
Current behaviour
Execution throws error after a while. It does not start the trainer whatsoever.
Error thrown:
As suggested by @NanoCode012 I changed the
config.json
of the model in order to add"disable_exllama": true
in thequantization_config
section. This thows a different error:GPU memory should be enough (24GB RTX3090).
Steps to reproduce
Clone repository
Install dependencies
Execute
accelerate launch scripts/finetune.py examples/llama-2/gptq-lora.yml
To change the config.json file, I downloaded the model data into a local folder using the following code:
And then added
"disable_exllama": true
in thequantization_config
section of the file.Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10.11
axolotl branch-commit
main
Acknowledgements
The text was updated successfully, but these errors were encountered: