Bias tensors #1259

gabe-l-hart · 2024-10-03T15:35:16Z

Dependencies

This PR is part of a sequence in support of adding Granite Code. It depends on merging the following PRs:

Safetensors: Safetensors #1255

Issues

Description

This PR adds support for models which have bias tensors for the attention and ffn modules alongside the primary weight tensors.

Changes

Add the bias tensors to the weight_map in HF checkpoint conversion
Handle merged wqkv tensors for bias as well as weights in HF checkpoint conversion
- This includes changes to the permutation logic to support the shapes of the bias tensors. I leveraged the corresponding logic in llama.cpp's converter.
Add configs to TransformerArgs to allow models to indicate the presence of attention_bias and feed_forward_bias tensors
Populate the Attention and FeedForward modules' tensors' bias arguments based on the config args

Testing

In conjunction with my other changes for Granite Code, I've been able to validate that the results produced with this logic do produce the expected token sequence.

NOTE: If there's any preferred way to include unit tests along with the PR, please let me know and I can get them added! I don't see a familiar unit test structure in the project at this point, so I've been relying on local ad-hoc testing.

pytorch-bot · 2024-10-03T15:35:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1259

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4c70671 with merge base 397967f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

gabe-l-hart · 2024-10-04T20:02:46Z

Thanks for the review/merge on #1255! This PR is now ready for review

mikekg · 2024-10-04T23:04:25Z

Current tests are run through .github/workflows, with some scripts in .ci (including a script that can be used to ensure that code in
markdown files works).

Or were you looking for "unit tests" of subcomponents with a Python driver? If you are looking for python-level unit tests, I don't think we have any right now, but that doesn't mean we can't have any. If you want to make a proposal, you might discuss with @byjlw and @Jack-Khuu, and @lessw2020 for distributed inference.

gabe-l-hart · 2024-10-07T19:26:14Z

Hi @mikekg thanks for the pointers! I was looking for unit tests, but I'll dig into the CI workflows too.

mikekg · 2024-10-07T20:13:23Z

Hi @mikekg thanks for the pointers! I was looking for unit tests, but I'll dig into the CI workflows too.

There aren't any unit tests today, the workflows run as shell scripts. It's very conceivable to create a unit test directory and set up a workflow that runs them. I think the most interesting question is what testing framework to use. I don't have any string convictions about that, what do you suggest @Jack-Khuu ?

Jack-Khuu · 2024-10-07T21:22:07Z

Thanks @gabe-l-hart for the PR, taking a deeper look at it now

Seconding what @mikekg mentioned, our workflow currently works as shell scripts (plays well since we run the commands on a suite of various machines), but that mostly helps internal CI.

I want to add a unittest directory (python backed) that OSS can replicate locally, but haven't got around to it. Might spin up a umbrella laundry list issue so the community can contribute

gabe-l-hart · 2024-10-07T21:24:42Z

Sounds great! I've used pytest pretty extensively in the past and would be happy to contribute to tests in the future if needed.

Jack-Khuu

Nice, thanks for adding

Jack-Khuu · 2024-10-07T21:29:16Z

torchchat/cli/convert_hf_checkpoint.py

-            if new_key is None:
-                continue
+            new_key = weight_map.get(abstract_key, abstract_key)
            new_key = new_key.format(layer_num)
        else:
-            new_key = weight_map[key]
+            new_key = weight_map.get(key, key)


I might be misreading something, but why are we keeping keys with missing entries

This is probably not fully needed now that I understand how the mapping is working better. I added this when trying to get it working since Granite Code had missing entries. I was thinking at the time that unconverted tensor names would be treated in a generic way post-conversion, but I now realize that's not the case. I'll remove this change.

Jack-Khuu · 2024-10-07T21:35:39Z

torchchat/cli/convert_hf_checkpoint.py

-            v = final_result[key.replace("wq", "wv")]
+            k = final_result[wk_key]
+            v = final_result[wv_key]
+            print(key)


Suggested change

print(key)

Yikes! Sloppy, good catch

Jack-Khuu · 2024-10-07T21:35:48Z

torchchat/cli/convert_hf_checkpoint.py

            q = permute(q, config.n_heads)
+            print(wk_key)


Suggested change

print(wk_key)

Jack-Khuu · 2024-10-07T22:03:43Z

Hmmm one thing that we'll need to do is tell people to redownload though if they already have a local checkpoint...

we can spin up that message right before we land

Jack-Khuu · 2024-10-07T23:31:48Z

torchchat/cli/convert_hf_checkpoint.py

-            new_key = weight_map[abstract_key]
-            if new_key is None:
-                continue
+            new_key = weight_map.get(abstract_key, abstract_key)


Ditto? Do we need the update here?

Yep, good catch

Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Jack-Khuu · 2024-10-09T02:15:47Z

Whoops thought I hit land last night, will land when CI finishes

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 3, 2024

This was referenced Oct 3, 2024

Add support for separate bias tensors #1250

Closed

Tied word embeddings #1260

Merged

Tokenizers tokenizer #1261

Merged

gabe-l-hart force-pushed the BiasTensors-1250 branch 3 times, most recently from 964ae69 to bbea338 Compare October 4, 2024 20:01

gabe-l-hart marked this pull request as ready for review October 4, 2024 20:02

Jack-Khuu approved these changes Oct 7, 2024

View reviewed changes

Jack-Khuu reviewed Oct 7, 2024

View reviewed changes

gabe-l-hart force-pushed the BiasTensors-1250 branch from 8fd6271 to 9513b5f Compare October 8, 2024 02:17

gabe-l-hart added 5 commits October 8, 2024 14:59

feat: Add support for attention and ff biases

128987a

Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix(convert): Add support for permuted kvq bias weights in HF conversion

b966b89

Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix(model): Add support for bias wqkv tensor in Attention

8947f49

Branch: GraniteCodeSupport Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix(convert): Remove prints and unnecessary dict get

2206a05

pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

fix(convert): Remove unnecessary safe dict get

4c70671

pytorch#1250 Branch: BiasTensors-1250 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

gabe-l-hart force-pushed the BiasTensors-1250 branch from 9513b5f to 4c70671 Compare October 8, 2024 21:01

Jack-Khuu merged commit 6a2a2e8 into pytorch:main Oct 9, 2024
52 checks passed

gabe-l-hart deleted the BiasTensors-1250 branch October 9, 2024 12:38

gabe-l-hart mentioned this pull request Oct 31, 2024

Granite code support #1336

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bias tensors #1259

Bias tensors #1259

gabe-l-hart commented Oct 3, 2024 •

edited

Loading

pytorch-bot bot commented Oct 3, 2024 •

edited

Loading

gabe-l-hart commented Oct 4, 2024

mikekg commented Oct 4, 2024 •

edited

Loading

gabe-l-hart commented Oct 7, 2024

mikekg commented Oct 7, 2024

Jack-Khuu commented Oct 7, 2024

gabe-l-hart commented Oct 7, 2024

Jack-Khuu left a comment

Jack-Khuu Oct 7, 2024

gabe-l-hart Oct 7, 2024

Jack-Khuu Oct 7, 2024

gabe-l-hart Oct 7, 2024

Jack-Khuu Oct 7, 2024

Jack-Khuu commented Oct 7, 2024

Jack-Khuu Oct 7, 2024 •

edited

Loading

gabe-l-hart Oct 8, 2024

Jack-Khuu commented Oct 9, 2024

Bias tensors #1259

Bias tensors #1259

Conversation

gabe-l-hart commented Oct 3, 2024 • edited Loading

Dependencies

Issues

Description

Changes

Testing

pytorch-bot bot commented Oct 3, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1259

✅ No Failures

gabe-l-hart commented Oct 4, 2024

mikekg commented Oct 4, 2024 • edited Loading

gabe-l-hart commented Oct 7, 2024

mikekg commented Oct 7, 2024

Jack-Khuu commented Oct 7, 2024

gabe-l-hart commented Oct 7, 2024

Jack-Khuu left a comment

Choose a reason for hiding this comment

Jack-Khuu Oct 7, 2024

Choose a reason for hiding this comment

gabe-l-hart Oct 7, 2024

Choose a reason for hiding this comment

Jack-Khuu Oct 7, 2024

Choose a reason for hiding this comment

gabe-l-hart Oct 7, 2024

Choose a reason for hiding this comment

Jack-Khuu Oct 7, 2024

Choose a reason for hiding this comment

Jack-Khuu commented Oct 7, 2024

Jack-Khuu Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

gabe-l-hart Oct 8, 2024

Choose a reason for hiding this comment

Jack-Khuu commented Oct 9, 2024

gabe-l-hart commented Oct 3, 2024 •

edited

Loading

pytorch-bot bot commented Oct 3, 2024 •

edited

Loading

mikekg commented Oct 4, 2024 •

edited

Loading

Jack-Khuu Oct 7, 2024 •

edited

Loading