-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bias tensors #1259
Bias tensors #1259
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1259
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 4c70671 with merge base 397967f ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
964ae69
to
bbea338
Compare
Thanks for the review/merge on #1255! This PR is now ready for review |
Current tests are run through .github/workflows, with some scripts in .ci (including a script that can be used to ensure that code in Or were you looking for "unit tests" of subcomponents with a Python driver? If you are looking for python-level unit tests, I don't think we have any right now, but that doesn't mean we can't have any. If you want to make a proposal, you might discuss with @byjlw and @Jack-Khuu, and @lessw2020 for distributed inference. |
Hi @mikekg thanks for the pointers! I was looking for unit tests, but I'll dig into the CI workflows too. |
There aren't any unit tests today, the workflows run as shell scripts. It's very conceivable to create a unit test directory and set up a workflow that runs them. I think the most interesting question is what testing framework to use. I don't have any string convictions about that, what do you suggest @Jack-Khuu ? |
Thanks @gabe-l-hart for the PR, taking a deeper look at it now Seconding what @mikekg mentioned, our workflow currently works as shell scripts (plays well since we run the commands on a suite of various machines), but that mostly helps internal CI. I want to add a unittest directory (python backed) that OSS can replicate locally, but haven't got around to it. Might spin up a umbrella laundry list issue so the community can contribute |
Sounds great! I've used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for adding
if new_key is None: | ||
continue | ||
new_key = weight_map.get(abstract_key, abstract_key) | ||
new_key = new_key.format(layer_num) | ||
else: | ||
new_key = weight_map[key] | ||
new_key = weight_map.get(key, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misreading something, but why are we keeping keys with missing entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not fully needed now that I understand how the mapping is working better. I added this when trying to get it working since Granite Code had missing entries. I was thinking at the time that unconverted tensor names would be treated in a generic way post-conversion, but I now realize that's not the case. I'll remove this change.
v = final_result[key.replace("wq", "wv")] | ||
k = final_result[wk_key] | ||
v = final_result[wv_key] | ||
print(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes! Sloppy, good catch
q = permute(q, config.n_heads) | ||
print(wk_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(wk_key) |
Hmmm one thing that we'll need to do is tell people to redownload though if they already have a local checkpoint... we can spin up that message right before we land |
new_key = weight_map[abstract_key] | ||
if new_key is None: | ||
continue | ||
new_key = weight_map.get(abstract_key, abstract_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto? Do we need the update here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch
8fd6271
to
9513b5f
Compare
Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
9513b5f
to
4c70671
Compare
Whoops thought I hit land last night, will land when CI finishes |
Dependencies
This PR is part of a sequence in support of adding Granite Code. It depends on merging the following PRs:
Issues
Closes #1250
Description
This PR adds support for models which have
bias
tensors for the attention and ffn modules alongside the primary weight tensors.Changes
weight_map
in HF checkpoint conversionwqkv
tensors for bias as well as weights in HF checkpoint conversionTransformerArgs
to allow models to indicate the presence ofattention_bias
andfeed_forward_bias
tensorsAttention
andFeedForward
modules' tensors'bias
arguments based on the config argsTesting
In conjunction with my other changes for Granite Code, I've been able to validate that the results produced with this logic do produce the expected token sequence.
NOTE: If there's any preferred way to include unit tests along with the PR, please let me know and I can get them added! I don't see a familiar unit test structure in the project at this point, so I've been relying on local ad-hoc testing.