Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Axolotl supports falcon + qlora #132

Merged
merged 7 commits into from
Jun 8, 2023

Conversation

utensil
Copy link
Contributor

@utensil utensil commented May 31, 2023

This PR:

To reproduce falcon + qlora:

Disclaimer: the config works, but might not be optimal. Improvements welcome!

@winglian
Copy link
Collaborator

Thanks! would you mind adding an Errata section to the bottom of the README, specifically that falcon/qlora/xformers doesn't work? someone will ultimately attempt to try that and just having it documented somewhere would be a great help.

@utensil
Copy link
Contributor Author

utensil commented Jun 1, 2023

OK, I'll test the combination today based on the new xformer patch landed in the docker image, and add the section. Hopefully I'll have an idea about why it won't work too.

I'll also test flash attention and change max packed sequence length to empty as suggested by caseus on Discord to see if it helps with the VRAM usage.

@utensil utensil closed this Jun 1, 2023
@utensil utensil reopened this Jun 1, 2023
@utensil
Copy link
Contributor Author

utensil commented Jun 1, 2023

By an Errata section, do you mean that for each combination that axolotl doesn't support yet, there's a short description about the unsupported reason like errors and tracking issue?

Also, it would be nice to link check marks to example configs too, I was attempted to do so in the PR 😉

Copy link
Collaborator

@NanoCode012 NanoCode012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some points I saw

examples/falcon/config-7b-qlora.yml Show resolved Hide resolved
examples/falcon/config-7b-qlora.yml Outdated Show resolved Hide resolved
@FarisHijazi
Copy link
Contributor

I just tried your changes, they don't work

Here's what I get:

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
  File "/workspace/axolotl/scripts/finetune.py", line 294, in <module>
    fire.Fire(train)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/scripts/finetune.py", line 281, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1973, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2787, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2819, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/peft_model.py", line 663, in forward
    return self.base_model(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 753, in forward
    transformer_outputs = self.transformer(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 640, in forward
    outputs = torch.utils.checkpoint.checkpoint(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 636, in custom_forward
    return module(*inputs, use_cache=use_cache, output_attentions=output_attentions)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 385, in forward
    attn_outputs = self.self_attention(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 242, in forward
    fused_qkv = self.query_key_value(hidden_states)  # [batch_size, seq_length, 3 x hidden_size]
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/tuners/lora.py", line 487, in forward
    result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (392x4544 and 1x10614784)

micro_batch_size: 40
gradient_accumulation_steps: 2
num_epochs: 3
optimizer: paged_adamw_32bit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does paged_adamw_32bit converge? I remember seeing some tests where this optimizer is problematic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems fine in my tests, and it seems to help surviving VRAM spike

@utensil
Copy link
Contributor Author

utensil commented Jun 3, 2023

@FarisHijazi Hi, this happened before with outdated deps. In my tests, I updated to deps to master. I'll try to reproduce this in a raw setting and work my way back to determine which dep should be updated ASAP. For now, please check https://github.com/utensil/llm-playground/blob/main/scripts/prepare_qlora.sh .

@utensil
Copy link
Contributor Author

utensil commented Jun 3, 2023

I've just created a minimal Colab notebook, so anyone can jump start to try it out. It's using a free T4 GPU instance.

The notebook works nice and clean as is.

But I'll break it down here to describe issues I encountered during creating that notebook, since others might encounter similar issues when deviating from the notebook:

ModuleNotFoundError: No module named 'peft'

Fixed by pip install git+https://github.com/huggingface/peft.git

It should have been installed by axolotl but it's still not found.

Other packages in 4. Install QLoRA dependencies are actually already installed and works fine.

UPDATE: Found the root cause of this and raised #151

RuntimeError: self and mat2 must have the same dtype

This error appears if one encounters the error above, then run pip install peft. This would install an old version of peft before QLoRA PR landed in peft.

ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0

This is a T4-specific issue.

Fixed by bf16: false . I like bf16 whenever possible.

ValueError: --tf32 requires Ampere or a newer GPU arch, cuda>=11 and torch>=1.7

Also a T4-specific issue.

Fixed by tf32: false. But I don't know its ramification.

@FarisHijazi
Copy link
Contributor

I think most of these float type errors cna be solved by changing the gpu (colab peo or switching to another peovider)
I'll test it out
Btw i love your work

@utensil
Copy link
Contributor Author

utensil commented Jun 3, 2023

@FarisHijazi Thank you!

The root cause of the issue you encountered has been found. It is described in #151 . It's easy to fix it by running pip install -U git+https://github.com/huggingface/peft.git .

@FarisHijazi
Copy link
Contributor

damn..... shape error getting solved by upgrading a dependency...
insane

will try to run it now

@FarisHijazi
Copy link
Contributor

getting this error on my machine, will try some other things and update you

GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Error invalid device ordinal at line 359 in file /home/tim/git/bitsandbytes/csrc/pythonInterface.c
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.9/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/commands/launch.py", line 918, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/commands/launch.py", line 580, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py3.9/bin/python3', 'scripts/finetune.py', 'examples/falcon/config-1b-qlora-utensil.yml']' returned non-zero e

@utensil
Copy link
Contributor Author

utensil commented Jun 5, 2023

@FarisHijazi Possible fix to that: bitsandbytes-foundation/bitsandbytes#425 (comment)

Can you elaborate on the environment or setup? (Maybe on a separate issue to better follow up)

@FarisHijazi
Copy link
Contributor

Will try to get on that when i finish work

Btw i do get a bitsandbytes warning when i finetune the other models but it does train. But for falcon i get a warning and then later an error
Not sure if that's a problem
Will post later when I'm on my pc

@NanoCode012
Copy link
Collaborator

Hello @utensil , thank you for the extensive work and report.

Since, you have tested it working for falcon+qlora, would you be interested in merging this first? I also saw that you have included ipynb. Do you want to also upload the notebook to the same folder as the config before merge?

Regarding winglian's initial comment, he might've meant to add a short description for that combination here: https://github.com/OpenAccess-AI-Collective/axolotl#common-errors-

Lastly, you can follow up with a separate Issue for your other task (multi-gpu etc)?

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
@utensil
Copy link
Contributor Author

utensil commented Jun 8, 2023

Hi @NanoCode012 , thank you for the review, I'm OK to merge it as is, and leave other tasks including the notebook tasks and the errata section (I'm not so sure if they're still there now) to future PRs.

@NanoCode012
Copy link
Collaborator

Would it be possible to fix the failing pre-commit? You can see the error by pressing Details here or running

pre-commit install # if haven't before

pre-commit run --all-files

@NanoCode012 NanoCode012 merged commit c8242de into axolotl-ai-cloud:main Jun 8, 2023
@NanoCode012
Copy link
Collaborator

Thank you for the amazing work!

@ehartford
Copy link
Collaborator

ehartford commented Jun 8, 2023 via email

mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants