Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update get_unpad_data patching for multipack #2013

Merged
merged 4 commits into from
Nov 16, 2024

Conversation

chiragjn
Copy link
Contributor

@chiragjn chiragjn commented Nov 4, 2024

Fixes #1991 (at least attempts to)

The code for checking if a model has remote code is flawed where it assumes if someone is passing trust_remote_code then it must have remote code. That causes get_unpad_data patching to be missed if someone passes trust_remote_code: True breaking mulit packed sample packing with flash attention

This PR changes that to rely on auto_map defined in model's config.json

Plus additionally removes the pre 4.43 transformers patching code.

How has this been tested?

So far, manually, but would be nice to add tests. First would like to get some review feedback

Requesting review from @winglian and @NanoCode012

modeling_arch = importlib.import_module(module_name)
modeling_arch._get_unpad_data = get_unpad_data # pylint: disable=protected-access
if hasattr(modeling_arch, "_get_unpad_data"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any handling needed if"_get_unpad_data" is not available? For ex, throw error to say packing not available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to get an opinion on this, I am not sure myself. Maybe it can be an error for some known model types where we expect it and warnings for others?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think it's good to have a check in case upstream changes a model type we support without our knowledge.

else:
  if {model_type} in list_of_known_working_types:
     raise Error

src/axolotl/utils/models.py Outdated Show resolved Hide resolved
src/axolotl/monkeypatch/multipack.py Show resolved Hide resolved
@winglian
Copy link
Collaborator

I think once we address the last 2 suggestions, this should be good to go. We should also make sure we manually run the multigpu tests before merging this PR.

@NanoCode012
Copy link
Collaborator

NanoCode012 commented Nov 15, 2024

I think we should add a small e2e test for this. Perhaps similar to tests/e2e/test_lora_llama.py but for fft with trust_remote_code on. I'll run a few e2e tests first.

@NanoCode012
Copy link
Collaborator

NanoCode012 commented Nov 15, 2024

Manually Tested CI:

  • e2e multi-gpu as-is
  • e2e rebased to main
  • e2e multi-gpu rebased to main

Edit: Having some issues verifying the last two runs, re-running.

Edit2: Double-checked those tests are working.

@winglian winglian merged commit 0c8b1d8 into axolotl-ai-cloud:main Nov 16, 2024
13 checks passed
bursteratom pushed a commit that referenced this pull request Nov 18, 2024
* Update `get_unpad_data` patching for multipack

* Update src/axolotl/utils/models.py

* Update src/axolotl/utils/models.py

* Add test case

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RuntimeError: CUDA error: an illegal memory access was encountered [When Running SFT on Qwen2.5]
3 participants