-
-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor func load_model to class ModelLoader #1909
Conversation
@MengqingCao this is on our list to tackle this week to get merged in. We'll need to get this rebased. |
Thanks! I will do the rebase work soon.
…---- Replied Message ----
| From | Wing ***@***.***> |
| Date | 10/14/2024 21:40 |
| To | ***@***.***> |
| Cc | Mengqing ***@***.***>***@***.***> |
| Subject | Re: [axolotl-ai-cloud/axolotl] Refactor func load_model to class ModelLoader (PR #1909) |
@MengqingCao this is on our list to tackle this week to get merged in. We'll need to get this rebased.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
7987524
to
3df89db
Compare
@winglian sorry for a little delay. Now the rebase is done, please review it. BTW, I basically copy the original code to make the review a little easier. |
I will fix problems checked out by lint and other CI tests ASAP
…---- Replied Message ----
| From | ***@***.***> |
| Date | 10/14/2024 21:43 |
| To | ***@***.***> |
| Cc | |
| Subject | Re: [axolotl-ai-cloud/axolotl] Refactor func load_model to class ModelLoader (PR #1909) |
Thanks! I will do the rebase work soon.
---- Replied Message ----
| From | Wing ***@***.***> |
| Date | 10/14/2024 21:40 |
| To | ***@***.***> |
| Cc | Mengqing ***@***.***>***@***.***> |
| Subject | Re: [axolotl-ai-cloud/axolotl] Refactor func load_model to class ModelLoader (PR #1909) |
@MengqingCao this is on our list to tackle this week to get merged in. We'll need to get this rebased.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
* organize comman var into member var of class ModelLoader * split operations in load_model into separate member funcs * refactor cfg.load_in_Xbit to kwarg
224c73a
to
a714e99
Compare
Hi @winglian, the code has updated, plz retrigger the CI, thanks! |
I'm confused why test failed on |
@MengqingCao , from the tests, it may be erroring due to this below.
Let me see what should be done in a bit. |
tests/utils/test_models.py
Outdated
@pytest.mark.parametrize("embedding_modules", ["embed_tokens", "lm_head"]) | ||
@pytest.mark.parametrize( | ||
"dist_dtype", [torch.bfloat16, torch.float16, torch.float32] | ||
) | ||
@pytest.mark.parametrize("before_kbit_train_or_finetune", [True, False]) | ||
def test_convert_embedding_modules_dtype( | ||
self, embedding_modules, dist_dtype, before_kbit_train_or_finetune | ||
): | ||
tokenizer = load_tokenizer(self.cfg) | ||
self.model_loader.model, _ = load_model(self.cfg, tokenizer, inference=False) | ||
|
||
self.model_loader.convert_embedding_modules_dtype( | ||
embedding_modules, dist_dtype, before_kbit_train_or_finetune | ||
) | ||
for name, module in self.model_loader.model.named_modules(): | ||
if ( | ||
"norm" in name | ||
or (before_kbit_train_or_finetune and name.endswith(".gate")) | ||
or ( | ||
any(m in name for m in embedding_modules) | ||
and hasattr(module, "weight") | ||
) | ||
): | ||
for _, param in module.named_parameters(recurse=False): | ||
assert param.dtype == dist_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move thjis one to it's own e2e/ test that runs on a GPU instance. I believe it's ooming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or let's use a config fixture that uses a much smaller model like a 68M parameter model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@winglian @NanoCode012 Thanks for your help. I have moved it to e2e now. Retrigger the CI plz to check if this fix work.
@winglian I spent some time fixing the failed UTs and found that |
That's a nice catch. Been debugging it yesterday and couldn't figure out exactly why it failed when all tests are ran together. I suspected I re-triggered the CI, and they are passing so far. |
It's too hidden to determine the cause, and it fails from the moment it imports |
I tested llama-68m model on my machine, and it raised AssertionFailed error. I'll try to fix it tommorow
…---- Replied Message ----
| From | ***@***.***> |
| Date | 10/22/2024 15:39 |
| To | ***@***.***> |
| Cc | Mengqing ***@***.***>***@***.***> |
| Subject | Re: [axolotl-ai-cloud/axolotl] Refactor func load_model to class ModelLoader (PR #1909) |
@winglian I spent some time fixing the failed UTs and found that load_cfg breaks the caplog, which causes these UTs to fail. The latest UTs just use DictDefault to create cfg to fix it. Could you please retrigger the CI again to verify the current code?
That's a nice catch. Been debugging it yesterday and couldn't figure out exactly why it failed when all tests are ran together. I suspected caplog but when I tried using capsys, it failed too..
I re-triggered the CI, and they are passing so far.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@winglian @NanoCode012 Thanks a lot for your work! All UTs in Since quantized parameters cannot be converted to data types simply via I should have used a small parameter model first so that I could test it locally at first instead of oom... |
Description
part of #1758
This PR refactor the func
load_model
insrc/axolotl/utils/models.py
into a classModelLoader
. Different member functions of classModelLoader
are separated according to their features, and all the member vars ofModelLoader
are shared in these funcs. Moreover, this refactoring make the pipeline of model loading more clearly.TODO:
Mainly changes are listed here:
ModelLoader
load_model
into separate member funcscfg.load_in_Xbit
to kwargThe UML of
ModelLoader
:Motivation and Context
Why is this change required?
As the models loaded in Axolotl support more and more features, the func
load_model
is huge now. And this results in confusion about variable changes when abstracting part of funcload_model
(#1758 (review)). Refactoringload_model
will optimize the code structure and facilitate stable evolution when introducing more features in the future.How has this been tested?
open_llama_3b_v2
model, and here comes the screenshot of inferencing:However, I don't have access to Ampere or newer GPU, thus I cannot pass the UT on my local machine. It would be nice if all UTs could be tested on CI.