BLOOM Flax #18022

younesbelkada · 2022-07-05T09:30:20Z

What does this PR do?

An attempt of adding Flax implementation of BLOOM - original PR from @haileyschoelkopf #17761

TODOs:

alibi shifting for batched generation
change mask fill value cc @patil-suraj
optimize code (alibi creation + mask creation)

…bloom_flax

…to add_bloom_flax

src/transformers/models/bloom/modeling_flax_bloom.py

- do the alibi broadcasting only once

…bloom_flax # Conflicts: # src/transformers/models/bloom/modeling_flax_bloom.py

src/transformers/models/bloom/modeling_flax_bloom.py

sanchit-gandhi

Almost there! Just some small refactoring suggestions to clean the code up a bit

src/transformers/models/bloom/modeling_flax_bloom.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

…nsformers into add_bloom_flax

sanchit-gandhi

Thanks for addressing the previous comments @younesbelkada. Two things from this round of review:

Could we tidy up the build_alibi_tensor_flax function to avoid a triple nested function? Much of the logic can be copied over from PyTorch BLOOM!
Big question for me is whether we keep scan or not - I'm in favour of removing it for Flax BLOOM (see comments below)

src/transformers/modeling_flax_pytorch_utils.py

src/transformers/models/bloom/modeling_flax_bloom.py

sanchit-gandhi · 2022-10-11T16:37:35Z

src/transformers/models/bloom/modeling_flax_bloom.py

+        all_attentions = () if output_attentions else None
+        all_hidden_states = () if output_hidden_states else None
+
+        if self.use_scan:


We've currently left scan in the modelling code. Part of me thinks we should remove it for Transformers for the following reasons:

scan adds a lot of boilerplate code that isn't very easy to understand

It is only beneficial for compile times when the model size is large, mostly when training and less so for inference

In the latter case, users will also likely shard the model, meaning they can employ the standalone code in bloom-jax-inference where we can retain scan functionality.

We also found generation time to be slower when using scan vs not using it (despite a faster compile time). The generation time will amortise the compile time in any use case of Flax BLOOM.

Given that the philosophy of Transformers is functional, easy-to-understand code that is not necessarily fully optimised, I'm in favour of stripping scan from Flax BLOOM and leaving it to bloom-jax-inference to serve users that want to deploy larger variants of the model.

WDYT @patrickvonplaten @patil-suraj @younesbelkada?

src/transformers/models/bloom/modeling_flax_bloom.py

- remove unused code - refactor a bit - revert import `torch`

- change build alibi

younesbelkada

Should have addressed your new suggestions @sanchit-gandhi ! Here I mainly focused on refactoring a bit the build_alibi function to match the implementational style of Pytorch -> this way it seems to be more readable !
I will leave you and @patil-suraj and @patrickvonplaten decide regarding the scan feature and happy to remove it once we agree on that!
Can also confirm the slow tests/conversion tests pass ;)

src/transformers/modeling_flax_pytorch_utils.py

src/transformers/models/bloom/modeling_flax_bloom.py

github-actions · 2022-11-06T15:03:51Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

younesbelkada · 2022-11-15T15:03:15Z

Should we merge this one ? cc @patrickvonplaten @patil-suraj @sanchit-gandhi

HuggingFaceDocBuilderDev · 2022-11-15T15:16:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

github-actions · 2022-12-10T15:03:08Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

wangkuiyi · 2023-06-14T03:31:03Z

Both #17761 and this PR show a lot of work. Why isn't it merged?

sanchit-gandhi · 2023-06-14T18:59:06Z

Adding this to my TODOs

younesbelkada and others added 17 commits July 5, 2022 11:27

First commit

cd19981

step 1 working

23c0f74

add alibi

9d33197

placeholder for scan

1d3cf96

add matrix mult alibi

bcb1eeb

beta scaling factor for bmm

e969410

working v1 - simple forward pass

ad537e7

move layer_number from attribute to arg in call

b89c268

partial functioning scan

a14566f

add more modifs

086e013

add test

846c658

hacky working scan

ed8d023

Merge remote-tracking branch 'younesbelkada/add_bloom_flax' into add_…

2aeac52

…bloom_flax

update scan for new kwarg order

768784c

Merge branch 'main' of https://github.com/huggingface/transformers in…

0998f31

…to add_bloom_flax

fix position_ids problem

8f0a509

fix bug in attention layer

f96979f

patrickvonplaten reviewed Jul 5, 2022

View reviewed changes

src/transformers/models/bloom/modeling_flax_bloom.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jul 5, 2022

View reviewed changes

src/transformers/models/bloom/modeling_flax_bloom.py Outdated Show resolved Hide resolved

younesbelkada and others added 8 commits July 6, 2022 09:57

small fix

af86c28

- do the alibi broadcasting only once

alibi shifting

2b4f22e

prelim refactor

b682321

finish refactor

3444985

Merge remote-tracking branch 'younesbelkada/add_bloom_flax' into add_…

807aae5

…bloom_flax # Conflicts: # src/transformers/models/bloom/modeling_flax_bloom.py

incorporate dropout_add to attention module

afd4776

make style

b6bb401

make padding work again

d4f39d3

sanchit-gandhi reviewed Jul 6, 2022

View reviewed changes

src/transformers/models/bloom/modeling_flax_bloom.py Outdated Show resolved Hide resolved

patrickvonplaten added 2 commits July 6, 2022 12:35

update

4bcfc1b

remove bogus file

e9ad307

huggingface deleted a comment from github-actions bot Oct 10, 2022

sanchit-gandhi reviewed Oct 10, 2022

View reviewed changes

younesbelkada and others added 7 commits October 11, 2022 17:00

Update src/transformers/models/bloom/modeling_flax_bloom.py

ea7dc48

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

fix slow test

0ba8cdd

Merge branch 'add_bloom_flax' of https://github.com/younesbelkada/tra…

2c51c27

…nsformers into add_bloom_flax

Merge remote-tracking branch 'upstream/main' into add_bloom_flax

0349f8c

remove layer_past

0b70bf2

refactor a bit

b002f64

fix scan slow test

42faf0e

sanchit-gandhi reviewed Oct 11, 2022

View reviewed changes

younesbelkada added 3 commits October 11, 2022 17:05

remove useless import

453bdc1

major changes

7f9d74e

- remove unused code - refactor a bit - revert import `torch`

major refactoring

38c5946

- change build alibi

younesbelkada commented Oct 11, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/main' into add_bloom_flax

dcdd563

younesbelkada force-pushed the add_bloom_flax branch from d789a85 to dcdd563 Compare October 13, 2022 07:10

Merge remote-tracking branch 'upstream/main' into add_bloom_flax

41ab72e

github-actions bot closed this Nov 15, 2022

younesbelkada reopened this Nov 15, 2022

github-actions bot closed this Dec 19, 2022

sanchit-gandhi reopened this Jun 14, 2023

github-actions bot closed this Jul 17, 2023

huggingface deleted a comment from github-actions bot Jul 25, 2023

sanchit-gandhi mentioned this pull request Jul 25, 2023

Add bloom flax #25094

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLOOM Flax #18022

BLOOM Flax #18022

younesbelkada commented Jul 5, 2022 •

edited

Loading

sanchit-gandhi left a comment

sanchit-gandhi left a comment

sanchit-gandhi Oct 11, 2022 •

edited

Loading

younesbelkada left a comment •

edited

Loading

github-actions bot commented Nov 6, 2022

younesbelkada commented Nov 15, 2022

HuggingFaceDocBuilderDev commented Nov 15, 2022

github-actions bot commented Dec 10, 2022

wangkuiyi commented Jun 14, 2023

sanchit-gandhi commented Jun 14, 2023

BLOOM Flax #18022

BLOOM Flax #18022

Conversation

younesbelkada commented Jul 5, 2022 • edited Loading

What does this PR do?

TODOs:

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi left a comment

Choose a reason for hiding this comment

sanchit-gandhi Oct 11, 2022 • edited Loading

Choose a reason for hiding this comment

younesbelkada left a comment • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Nov 6, 2022

younesbelkada commented Nov 15, 2022

HuggingFaceDocBuilderDev commented Nov 15, 2022

github-actions bot commented Dec 10, 2022

wangkuiyi commented Jun 14, 2023

sanchit-gandhi commented Jun 14, 2023

younesbelkada commented Jul 5, 2022 •

edited

Loading

sanchit-gandhi Oct 11, 2022 •

edited

Loading

younesbelkada left a comment •

edited

Loading