Axolotl supports falcon + qlora #132

utensil · 2023-05-31T16:08:18Z

This PR:

To reproduce falcon + qlora:

The entrypoint script for the docker image winglian/axolotl-runpod:main-cu118-2.0.0
The training notebooks for 1b, 7b and 40b (check out file history for earlier training sessions, but not all runs are preserved)

Disclaimer: the config works, but might not be optimal. Improvements welcome!

winglian · 2023-05-31T16:14:12Z

Thanks! would you mind adding an Errata section to the bottom of the README, specifically that falcon/qlora/xformers doesn't work? someone will ultimately attempt to try that and just having it documented somewhere would be a great help.

utensil · 2023-06-01T01:05:25Z

OK, I'll test the combination today based on the new xformer patch landed in the docker image, and add the section. Hopefully I'll have an idea about why it won't work too.

I'll also test flash attention and change max packed sequence length to empty as suggested by caseus on Discord to see if it helps with the VRAM usage.

utensil · 2023-06-01T02:05:18Z

By an Errata section, do you mean that for each combination that axolotl doesn't support yet, there's a short description about the unsupported reason like errors and tracking issue?

Also, it would be nice to link check marks to example configs too, I was attempted to do so in the PR 😉

NanoCode012

Just some points I saw

examples/falcon/config-7b-qlora.yml

FarisHijazi · 2023-06-02T15:58:59Z

I just tried your changes, they don't work

Here's what I get:

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
  File "/workspace/axolotl/scripts/finetune.py", line 294, in <module>
    fire.Fire(train)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/scripts/finetune.py", line 281, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1973, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2787, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2819, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/peft_model.py", line 663, in forward
    return self.base_model(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 753, in forward
    transformer_outputs = self.transformer(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 640, in forward
    outputs = torch.utils.checkpoint.checkpoint(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 636, in custom_forward
    return module(*inputs, use_cache=use_cache, output_attentions=output_attentions)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 385, in forward
    attn_outputs = self.self_attention(
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/da8d49a4c7dde3bfc39461e6f2cf7433e2fa44c2/modelling_RW.py", line 242, in forward
    fused_qkv = self.query_key_value(hidden_states)  # [batch_size, seq_length, 3 x hidden_size]
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/tuners/lora.py", line 487, in forward
    result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (392x4544 and 1x10614784)

winglian · 2023-06-02T22:35:15Z

examples/falcon/config-7b-qlora.yml

+micro_batch_size: 40
+gradient_accumulation_steps: 2
+num_epochs: 3
+optimizer: paged_adamw_32bit


does paged_adamw_32bit converge? I remember seeing some tests where this optimizer is problematic

it seems fine in my tests, and it seems to help surviving VRAM spike

utensil · 2023-06-03T03:58:08Z

@FarisHijazi Hi, this happened before with outdated deps. In my tests, I updated to deps to master. I'll try to reproduce this in a raw setting and work my way back to determine which dep should be updated ASAP. For now, please check https://github.com/utensil/llm-playground/blob/main/scripts/prepare_qlora.sh .

utensil · 2023-06-03T11:37:43Z

I've just created a minimal Colab notebook, so anyone can jump start to try it out. It's using a free T4 GPU instance.

The notebook works nice and clean as is.

But I'll break it down here to describe issues I encountered during creating that notebook, since others might encounter similar issues when deviating from the notebook:

ModuleNotFoundError: No module named 'peft'

Fixed by pip install git+https://github.com/huggingface/peft.git

It should have been installed by axolotl but it's still not found.

Other packages in 4. Install QLoRA dependencies are actually already installed and works fine.

UPDATE: Found the root cause of this and raised #151

RuntimeError: self and mat2 must have the same dtype

This error appears if one encounters the error above, then run pip install peft. This would install an old version of peft before QLoRA PR landed in peft.

ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0

This is a T4-specific issue.

Fixed by bf16: false . I like bf16 whenever possible.

ValueError: --tf32 requires Ampere or a newer GPU arch, cuda>=11 and torch>=1.7

Also a T4-specific issue.

Fixed by tf32: false. But I don't know its ramification.

FarisHijazi · 2023-06-03T13:40:20Z

I think most of these float type errors cna be solved by changing the gpu (colab peo or switching to another peovider)
I'll test it out
Btw i love your work

utensil · 2023-06-03T14:08:20Z

@FarisHijazi Thank you!

The root cause of the issue you encountered has been found. It is described in #151 . It's easy to fix it by running pip install -U git+https://github.com/huggingface/peft.git .

FarisHijazi · 2023-06-03T15:08:36Z

damn..... shape error getting solved by upgrading a dependency...
insane

will try to run it now

FarisHijazi · 2023-06-03T20:58:38Z

getting this error on my machine, will try some other things and update you

GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Error invalid device ordinal at line 359 in file /home/tim/git/bitsandbytes/csrc/pythonInterface.c
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.9/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/commands/launch.py", line 918, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/commands/launch.py", line 580, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py3.9/bin/python3', 'scripts/finetune.py', 'examples/falcon/config-1b-qlora-utensil.yml']' returned non-zero e

utensil · 2023-06-05T01:32:03Z

@FarisHijazi Possible fix to that: bitsandbytes-foundation/bitsandbytes#425 (comment)

Can you elaborate on the environment or setup? (Maybe on a separate issue to better follow up)

FarisHijazi · 2023-06-05T09:02:44Z

Will try to get on that when i finish work

Btw i do get a bitsandbytes warning when i finetune the other models but it does train. But for falcon i get a warning and then later an error
Not sure if that's a problem
Will post later when I'm on my pc

NanoCode012 · 2023-06-08T14:18:26Z

Hello @utensil , thank you for the extensive work and report.

Since, you have tested it working for falcon+qlora, would you be interested in merging this first? I also saw that you have included ipynb. Do you want to also upload the notebook to the same folder as the config before merge?

Regarding winglian's initial comment, he might've meant to add a short description for that combination here: https://github.com/OpenAccess-AI-Collective/axolotl#common-errors-

Lastly, you can follow up with a separate Issue for your other task (multi-gpu etc)?

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

utensil · 2023-06-08T15:04:49Z

Hi @NanoCode012 , thank you for the review, I'm OK to merge it as is, and leave other tasks including the notebook tasks and the errata section (I'm not so sure if they're still there now) to future PRs.

NanoCode012 · 2023-06-08T15:18:11Z

Would it be possible to fix the failing pre-commit? You can see the error by pressing Details here or running

pre-commit install # if haven't before

pre-commit run --all-files

examples/falcon/config-7b-qlora.yml

NanoCode012 · 2023-06-08T16:14:15Z

Thank you for the amazing work!

ehartford · 2023-06-08T20:10:36Z

nice I gotta try this with Samantha

…

On Thu, Jun 8, 2023 at 9:14 AM NanoCode012 ***@***.***> wrote: Thank you for the amazing work! — Reply to this email directly, view it on GitHub <#132 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIQ4BOTTVMF56CLXIHDUWTXKH26FANCNFSM6AAAAAAYVWDPPY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Axolotl supports falcon + qlora

utensil added 2 commits May 31, 2023 23:58

Axolotl supports falcon + qlora

8afb0fb

Create config-7b-qlora.yml

72bf8aa

utensil closed this Jun 1, 2023

utensil reopened this Jun 1, 2023

NanoCode012 reviewed Jun 1, 2023

View reviewed changes

examples/falcon/config-7b-qlora.yml Show resolved Hide resolved

examples/falcon/config-7b-qlora.yml Outdated Show resolved Hide resolved

falcon + qlora + xformer mbs 40 gas 2 on A6000

fb3d40f

utensil mentioned this pull request Jun 2, 2023

[Bug] Tokenizer's BOS/EOS/PAD not set for inference #139

Closed

winglian reviewed Jun 2, 2023

View reviewed changes

utensil added 2 commits June 3, 2023 15:04

Add comments/alternatives for falcon-qlora configs

ca11ae9

Default micro_batch_size to 1 for a safer start

c9c0503

utensil mentioned this pull request Jun 3, 2023

Peft is not installed via pip install -e . causing arbitrary peft issues on user end #151

Closed

Default wandb_project to empty as suggested

a52f481

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

utensil commented Jun 8, 2023

View reviewed changes

examples/falcon/config-7b-qlora.yml Outdated Show resolved Hide resolved

Trim trailing whitespace

79a8f52

NanoCode012 merged commit c8242de into axolotl-ai-cloud:main Jun 8, 2023

flexchar mentioned this pull request Oct 7, 2023

Mistral 7B | RuntimeError: query and key must have the same dtype #695

Closed

8 tasks

mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023

Merge pull request axolotl-ai-cloud#132 from utensil/falcon-7b-qlora

7557e94

Axolotl supports falcon + qlora

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Axolotl supports falcon + qlora #132

Axolotl supports falcon + qlora #132

utensil commented May 31, 2023 •

edited

Loading

winglian commented May 31, 2023

utensil commented Jun 1, 2023 •

edited

Loading

utensil commented Jun 1, 2023

NanoCode012 left a comment

FarisHijazi commented Jun 2, 2023

winglian Jun 2, 2023

utensil Jun 3, 2023

utensil commented Jun 3, 2023 •

edited

Loading

utensil commented Jun 3, 2023 •

edited

Loading

FarisHijazi commented Jun 3, 2023

utensil commented Jun 3, 2023 •

edited

Loading

FarisHijazi commented Jun 3, 2023

FarisHijazi commented Jun 3, 2023

utensil commented Jun 5, 2023 •

edited

Loading

FarisHijazi commented Jun 5, 2023

NanoCode012 commented Jun 8, 2023

utensil commented Jun 8, 2023

NanoCode012 commented Jun 8, 2023

NanoCode012 commented Jun 8, 2023

ehartford commented Jun 8, 2023 via email

Axolotl supports falcon + qlora #132

Axolotl supports falcon + qlora #132

Conversation

utensil commented May 31, 2023 • edited Loading

winglian commented May 31, 2023

utensil commented Jun 1, 2023 • edited Loading

utensil commented Jun 1, 2023

NanoCode012 left a comment

Choose a reason for hiding this comment

FarisHijazi commented Jun 2, 2023

winglian Jun 2, 2023

Choose a reason for hiding this comment

utensil Jun 3, 2023

Choose a reason for hiding this comment

utensil commented Jun 3, 2023 • edited Loading

utensil commented Jun 3, 2023 • edited Loading

ModuleNotFoundError: No module named 'peft'

RuntimeError: self and mat2 must have the same dtype

ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0

ValueError: --tf32 requires Ampere or a newer GPU arch, cuda>=11 and torch>=1.7

FarisHijazi commented Jun 3, 2023

utensil commented Jun 3, 2023 • edited Loading

FarisHijazi commented Jun 3, 2023

FarisHijazi commented Jun 3, 2023

utensil commented Jun 5, 2023 • edited Loading

FarisHijazi commented Jun 5, 2023

NanoCode012 commented Jun 8, 2023

utensil commented Jun 8, 2023

NanoCode012 commented Jun 8, 2023

NanoCode012 commented Jun 8, 2023

ehartford commented Jun 8, 2023 via email

utensil commented May 31, 2023 •

edited

Loading

utensil commented Jun 1, 2023 •

edited

Loading

utensil commented Jun 3, 2023 •

edited

Loading

utensil commented Jun 3, 2023 •

edited

Loading

utensil commented Jun 3, 2023 •

edited

Loading

utensil commented Jun 5, 2023 •

edited

Loading