Add BitNet #85

DustinWang1 · 2024-11-26T22:11:54Z

Created BitNet implementation by copying the transformer code and replacing nn.Linear with fused bit linear. Added a "bit" version for the attention module to pair with the BitNet.

yzhangcs · 2024-11-27T04:31:38Z

@DustinWang1 Hello, thanks for this PR.

What's the diffs between SeptNet and BitNet?

DustinWang1 · 2024-11-27T04:48:43Z

Ah I did not mean to add SeptNet into this pull request. Is there any way for you to pull from "init changes"
on your side?

yzhangcs · 2024-11-27T04:51:22Z

@DustinWang1 Feel free to add new commits to delete undesired parts. I will squash them before merging :-)

yzhangcs · 2024-11-27T04:53:47Z

Also It would better to use isort to rearrange imported modules in your new commits

yzhangcs · 2024-11-27T05:01:12Z

@DustinWang1 Another minor reminder, we have updated the attn layer and Xfmr++ impls recently, primarily on Cache update and param inits. Please ensure your code aligns with these latest changes.

DustinWang1 · 2024-11-27T05:35:05Z

Thanks for letting me know about isort. There are so many nice utilities out there :0. I've rearranged the imports and synced my changes with the cache update and param inits.

yzhangcs · 2024-11-27T05:42:59Z

@DustinWang1 Thank you for your quick fix. Could you check out my latest comments again.

DustinWang1 · 2024-11-27T05:58:50Z

Are talking about the failed check? I'm currently fixing the style errors on my side, but keep in mind that I haven't changed many of the files that flake8 is citing, they were alr in the main repo. There is one error where S is not defined: "fla\ops\generalized_delta_rule\iplr\naive.py:43:9: F821 undefined name 'S'". I'm not sure how this part of the code works, could you take a look?

yzhangcs

@DustinWang1
https://github.com/DustinWang1/flash-linear-attention/blob/a56e65801f7adbc519edae8c72bfd891a1ddf836/fla/models/bitnet/modeling_bitnet.py#L63-L69

Be careful that once you called rms_norm_linear or swiglu_linear, the F.linear is conducted internally. You did not actually invoke quant layers.

fla/models/bitnet/modeling_bitnet.py

DustinWang1 · 2024-11-27T07:28:03Z

I replaced rms_norm_linear with a wrapper of the layer_norm_linear_quant_fn in modules/fused_bitlinear. For the swiglu function, I added an alternate version in modules/activations.py that uses a functional form of BitLinear. The function, bit_linear, is also just a wrapper for layer_norm_linear_quant_fn with rms_norm set to true. Let me know if this fixes the issue.

DustinWang1 and others added 3 commits November 6, 2024 23:51

add BitNet

115a38a

Merge branch 'sustcsonglin:main' into main

9ad7bdf

init changes

fcc026b

DustinWang1 force-pushed the main branch from f1e6c09 to fcc026b Compare November 27, 2024 04:56

DustinWang1 and others added 2 commits November 26, 2024 21:01

Merge branch 'sustcsonglin:main' into main

43348cb

sync with Xfmr and attn changes

d8f16b7

yzhangcs self-requested a review November 27, 2024 05:35

style fix

a56e658

yzhangcs requested changes Nov 27, 2024

View reviewed changes

fla/models/bitnet/modeling_bitnet.py Outdated Show resolved Hide resolved

DustinWang1 added 4 commits November 26, 2024 23:12

layernorm and activations to quant

f4e1c27

import error fix

0d924b6

another import error

29a4115

style fix

a65f234

yzhangcs merged commit 7cc436f into fla-org:main Nov 27, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BitNet #85

Add BitNet #85

DustinWang1 commented Nov 26, 2024

yzhangcs commented Nov 27, 2024

DustinWang1 commented Nov 27, 2024

yzhangcs commented Nov 27, 2024

yzhangcs commented Nov 27, 2024

yzhangcs commented Nov 27, 2024

DustinWang1 commented Nov 27, 2024

yzhangcs commented Nov 27, 2024

DustinWang1 commented Nov 27, 2024

yzhangcs left a comment •

edited

Loading

DustinWang1 commented Nov 27, 2024

Add BitNet #85

Add BitNet #85

Conversation

DustinWang1 commented Nov 26, 2024

yzhangcs commented Nov 27, 2024

DustinWang1 commented Nov 27, 2024

yzhangcs commented Nov 27, 2024

yzhangcs commented Nov 27, 2024

yzhangcs commented Nov 27, 2024

DustinWang1 commented Nov 27, 2024

yzhangcs commented Nov 27, 2024

DustinWang1 commented Nov 27, 2024

yzhangcs left a comment • edited Loading

Choose a reason for hiding this comment

DustinWang1 commented Nov 27, 2024

yzhangcs left a comment •

edited

Loading