-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Error with OLoRA init when using bnb #2011
Merged
Merged
Changes from 7 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
9f3308a
FIX Error with OLoRA init when using bnb
BenjaminBossan 7649726
Further fixes for 8bit
BenjaminBossan 3aec8e3
Add unit tests
BenjaminBossan 7fe1b53
Reviewer feedback: simply bnb param construction
BenjaminBossan 54f9952
Merge branch 'main' into fix-olora-bnb
BenjaminBossan 9ad19ca
Make style
BenjaminBossan aae4ef1
Merge branch 'huggingface:main' into fix-olora-bnb
BenjaminBossan db62772
Reviewer feedback: Pass all bnb init params
BenjaminBossan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we quantizing the weights this time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally for bnb weights, the tensors are flattened, e.g. shape [64, 1]. But after dequantizing, the
weight_tensor
that we assign here is not flat anymore, e.g. shape [16, 16]. My reasoning was that we should get back a "correct" tensor, so better to re-initialize it.I tried what happens when I remove this and just do
base_layer.weight.data = weight_tensor
and curiously, this seems to work too and the test passes, even if the shape is now wrong. This makes me wonder if bnb somehow handles this automatically and we should not re-initialize (which could cause its own problems)? Not sure, any suggestion?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow that's really strange indeed. I tried to check the code in bnb and it doesn't look like they handle this. cc @matthewdouglas
I think that's fine as long as you pass the relevant kwargs that you can get from
orig_weight
. However, make sure to not passbnb_quantized
arg for the 4-bit case. Then, withto(orig_weight.device)
, it should quantize the weights properly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for the additional info.
To clarify, is the present code in alignment with what you suggest or do I need to call
to(orig_weight.device)
too?Okay, then it's probably better to get Matthew's opinion before merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as the shapes go, both 4bit and 8bit have some mechanisms in place to track the original shapes, but it's different for each. The
Linear8bitLt
has astate.SB
and for 4bit that information is part ofquant_state
. The main expectation is that it is all stored in a contiguous row-major format.That said, it's not really clear to me that
dequantize_module_weight()
is doing all that it would need to do. Maybe it would pass the test here but I would think the updated weights would not be quantized properly afterwards, so re-initializing it is probably the way to go.You'd want to have
.to(orig_weight.device)
in addition to the other kwargs as @SunMarc mentioned.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the inits to take into account all arguments. Unfortunately, this may get out of date if bnb is updated, but I think there is no method such as
bnb.create_param_like(tensor)
or such to offload this work to bnb.It would be great if you could do a final pass over the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just pass
orig_weight.__dict_
as the kwargs ? This is what how we did it in transformers.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if that's really more robust. If a new attribute is added that is not an
__init__
argument, this would fail, right?So no matter what, this code may break if there is some change to the
__init__
code in bnb.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, that right :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I merged as is then. Code is going to eventually break one way or the other :D