Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"triu_tril_cuda_template" not implemented for 'BFloat16' #1532

Closed
1 task done
ashmalvayani opened this issue Apr 17, 2024 · 3 comments
Closed
1 task done

"triu_tril_cuda_template" not implemented for 'BFloat16' #1532

ashmalvayani opened this issue Apr 17, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@ashmalvayani
Copy link

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I am trying to fine-tune CohereForAI/c4ai-command-r-v01 with the axolotl framework. The yaml file is as follows:

I am getting the following error in this case:

Traceback (most recent call last):
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in
fire.Fire(do_cli)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
return inner_training_loop(
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2118, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3036, in training_step
loss = self.compute_loss(model, inputs)
File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/core/trainer_builder.py", line 485, in compute_loss
return super().compute_loss(model, inputs, return_outputs=return_outputs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 3059, in compute_loss
outputs = model(**inputs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py", line 1099, in forward
outputs = self.model(
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py", line 889, in forward
causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position)
File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py", line 975, in _update_causal_mask
causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'

Current behaviour

Transformer version: 4.39.3
Torch version: 2.0.1 cu11.7
accelerate version: 0.28.0

Steps to reproduce

Change the yaml file as below and run the .train command with lora.yaml config file as below.

Config yaml

base_model: CohereForAI/c4ai-command-r-v01
trust_remote_code: true

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
    - path: Data_Clean3.json
      ds_type: json
      type: alpaca
dataset_prepared_path: last_run_prepared/cohere-command/3308b18091e3a983103cbeb4cceb82d0
val_set_size: 0.0
output_dir: ./outputs/c4ai_lora

sequence_len: 2048
sample_packing: false
pad_to_sequence_len: false

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

adapter: lora
lora_model_dir:
sample_packing: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16: 
tf32: false

gradient_checkpointing: false  # don't use with fsdp_activation_checkpointing
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch:
saves_per_epoch: 1
debug:
weight_decay: 0.0
deepspeed: 

special_tokens:
  bos_token: "<BOS_TOKEN>"
  eos_token: "<|END_OF_TURN_TOKEN|>"
  pad_token: "<PAD>"


### Possible solution

_No response_

### Which Operating Systems are you using?

- [X] Linux
- [ ] macOS
- [ ] Windows

### Python Version

3.10

### axolotl branch-commit

main

### Acknowledgements

- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
@ashmalvayani ashmalvayani added the bug Something isn't working label Apr 17, 2024
@NanoCode012
Copy link
Collaborator

Hm, I believe this issue should be asked upstream transformers as the issue is in their modeling code.

Here is a similar issue: huggingface/diffusers#3453

Would need to update to not use torch.triu due to pytorch not adding support for bf16.

@NanoCode012
Copy link
Collaborator

NanoCode012 commented Apr 18, 2024

Found a solution and it turns out to be your comment :) huggingface/transformers#30304 (comment)


Axolotl currently requires torch>2.1 I believe


Should this be closed then?

@ashmalvayani
Copy link
Author

Axolotl works well with even torch<2.1 with 8_bit but causes problems with 4-bit. but I think for now it should be fine. Closing this issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants