Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias tensors #1259

Merged
merged 5 commits into from
Oct 9, 2024
Merged

Bias tensors #1259

merged 5 commits into from
Oct 9, 2024

Conversation

gabe-l-hart
Copy link
Contributor

@gabe-l-hart gabe-l-hart commented Oct 3, 2024

Dependencies

This PR is part of a sequence in support of adding Granite Code. It depends on merging the following PRs:

Issues

Closes #1250

Description

This PR adds support for models which have bias tensors for the attention and ffn modules alongside the primary weight tensors.

Changes

  • Add the bias tensors to the weight_map in HF checkpoint conversion
  • Handle merged wqkv tensors for bias as well as weights in HF checkpoint conversion
    • This includes changes to the permutation logic to support the shapes of the bias tensors. I leveraged the corresponding logic in llama.cpp's converter.
  • Add configs to TransformerArgs to allow models to indicate the presence of attention_bias and feed_forward_bias tensors
  • Populate the Attention and FeedForward modules' tensors' bias arguments based on the config args

Testing

In conjunction with my other changes for Granite Code, I've been able to validate that the results produced with this logic do produce the expected token sequence.

NOTE: If there's any preferred way to include unit tests along with the PR, please let me know and I can get them added! I don't see a familiar unit test structure in the project at this point, so I've been relying on local ad-hoc testing.

Copy link

pytorch-bot bot commented Oct 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1259

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4c70671 with merge base 397967f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2024
@gabe-l-hart gabe-l-hart force-pushed the BiasTensors-1250 branch 3 times, most recently from 964ae69 to bbea338 Compare October 4, 2024 20:01
@gabe-l-hart gabe-l-hart marked this pull request as ready for review October 4, 2024 20:02
@gabe-l-hart
Copy link
Contributor Author

Thanks for the review/merge on #1255! This PR is now ready for review

@mikekg
Copy link

mikekg commented Oct 4, 2024

Current tests are run through .github/workflows, with some scripts in .ci (including a script that can be used to ensure that code in
markdown files works).

Or were you looking for "unit tests" of subcomponents with a Python driver? If you are looking for python-level unit tests, I don't think we have any right now, but that doesn't mean we can't have any. If you want to make a proposal, you might discuss with @byjlw and @Jack-Khuu, and @lessw2020 for distributed inference.

@gabe-l-hart
Copy link
Contributor Author

Hi @mikekg thanks for the pointers! I was looking for unit tests, but I'll dig into the CI workflows too.

@mikekg
Copy link

mikekg commented Oct 7, 2024

Hi @mikekg thanks for the pointers! I was looking for unit tests, but I'll dig into the CI workflows too.

There aren't any unit tests today, the workflows run as shell scripts. It's very conceivable to create a unit test directory and set up a workflow that runs them. I think the most interesting question is what testing framework to use. I don't have any string convictions about that, what do you suggest @Jack-Khuu ?

@Jack-Khuu
Copy link
Contributor

Thanks @gabe-l-hart for the PR, taking a deeper look at it now

Seconding what @mikekg mentioned, our workflow currently works as shell scripts (plays well since we run the commands on a suite of various machines), but that mostly helps internal CI.

I want to add a unittest directory (python backed) that OSS can replicate locally, but haven't got around to it. Might spin up a umbrella laundry list issue so the community can contribute

@gabe-l-hart
Copy link
Contributor Author

Sounds great! I've used pytest pretty extensively in the past and would be happy to contribute to tests in the future if needed.

Copy link
Contributor

@Jack-Khuu Jack-Khuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding

Comment on lines 139 to 148
if new_key is None:
continue
new_key = weight_map.get(abstract_key, abstract_key)
new_key = new_key.format(layer_num)
else:
new_key = weight_map[key]
new_key = weight_map.get(key, key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misreading something, but why are we keeping keys with missing entries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not fully needed now that I understand how the mapping is working better. I added this when trying to get it working since Granite Code had missing entries. I was thinking at the time that unconverted tensor names would be treated in a generic way post-conversion, but I now realize that's not the case. I'll remove this change.

v = final_result[key.replace("wq", "wv")]
k = final_result[wk_key]
v = final_result[wv_key]
print(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(key)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes! Sloppy, good catch

q = permute(q, config.n_heads)
print(wk_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(wk_key)

@Jack-Khuu
Copy link
Contributor

Hmmm one thing that we'll need to do is tell people to redownload though if they already have a local checkpoint...

we can spin up that message right before we land

new_key = weight_map[abstract_key]
if new_key is None:
continue
new_key = weight_map.get(abstract_key, abstract_key)
Copy link
Contributor

@Jack-Khuu Jack-Khuu Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto? Do we need the update here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch

Branch: GraniteCodeSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: GraniteCodeSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Branch: GraniteCodeSupport

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
pytorch#1250
Branch: BiasTensors-1250

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
pytorch#1250
Branch: BiasTensors-1250

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@Jack-Khuu
Copy link
Contributor

Whoops thought I hit land last night, will land when CI finishes

@Jack-Khuu Jack-Khuu merged commit 6a2a2e8 into pytorch:main Oct 9, 2024
52 checks passed
@gabe-l-hart gabe-l-hart deleted the BiasTensors-1250 branch October 9, 2024 12:38
@gabe-l-hart gabe-l-hart mentioned this pull request Oct 31, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for separate bias tensors
4 participants