You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for the detailed report! This seems because we fallback to fp32 instead of bf16, and then nf4 tensor's get_original_weight returns bf16 tensor, causing mismatch.
@rohan-varma Is the fallback because cuda packaging is verified in the verify_bf16_support() calling and packaging.version.parse(torch.version.cuda).release >= (11, 0) which won't be true for rocm?
@supernovae Sorry for the late follow up, but this is awesome! Looks like you've verified support in #803. Closing this out now but feel free to reopen if issues persist.
The QLora recipe is failing on AMD MI250 with
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
Steps to reproduce:
tune lora_finetune_single_device --config recipes/configs/llama2/7B_qlora_single_device.yaml epochs=2 max_steps_per_epoch=4
Error:
Full trace with details:
Environment
The text was updated successfully, but these errors were encountered: