[feature] Embedding weight tying (#169) #172

blefaudeux · 2022-01-03T22:40:11Z

What does this PR do?

Tentative implementation of #169, fairly minor, with a matching unit test update.
cc @erip
See for a reference and more context

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

codecov-commenter · 2022-01-03T23:37:19Z

Codecov Report

Merging #172 (77fa504) into main (154b819) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #172      +/-   ##
==========================================
+ Coverage   90.56%   90.58%   +0.01%     
==========================================
  Files          56       56              
  Lines        2829     2835       +6     
==========================================
+ Hits         2562     2568       +6     
  Misses        267      267

Flag	Coverage Δ
Python	`90.58% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
xformers/factory/model_factory.py	`97.82% <100.00%> (+0.15%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 70019c4...77fa504. Read the comment docs.

erip · 2022-01-04T15:41:05Z

Looks good! Huge thanks, @blefaudeux. I can pull in the changes to my project using xformers and try it out.

erip · 2022-01-04T22:18:12Z

Looks like there's a tiny bit of performance improvement (on my silly CPU machine):

➜  python train.py tie
Epoch 1 step: 1 Loss: 9.666109 Took 5.800530 seconds. bsz (toks): 2438
Epoch 1 step: 2 Loss: 8.875856 Took 15.910649 seconds. bsz (toks): 3595
Epoch 1 step: 3 Loss: 7.488206 Took 15.342067 seconds. bsz (toks): 3866
^C
...
➜  python train.py
Epoch 1 step: 1 Loss: 9.688322 Took 5.884865 seconds. bsz (toks): 2438
Epoch 1 step: 2 Loss: 8.820903 Took 16.065957 seconds. bsz (toks): 3595
Epoch 1 step: 3 Loss: 7.255448 Took 15.518760 seconds. bsz (toks): 3866

I think this is generally the right direction. I can also take a look at comparisons of memory util between them, too. That said, printing out the number of trainable parameters shows that this seems to work well:

Tied: There are 25,588,799 trainable parameters.
Untied: There are 33,359,807 trainable parameters.

blefaudeux · 2022-01-04T22:47:58Z

Looks like there's a tiny bit of performance improvement (on my silly CPU machine):
➜  python train.py tie
Epoch 1 step: 1 Loss: 9.666109 Took 5.800530 seconds. bsz (toks): 2438
Epoch 1 step: 2 Loss: 8.875856 Took 15.910649 seconds. bsz (toks): 3595
Epoch 1 step: 3 Loss: 7.488206 Took 15.342067 seconds. bsz (toks): 3866
^C
...
➜  python train.py
Epoch 1 step: 1 Loss: 9.688322 Took 5.884865 seconds. bsz (toks): 2438
Epoch 1 step: 2 Loss: 8.820903 Took 16.065957 seconds. bsz (toks): 3595
Epoch 1 step: 3 Loss: 7.255448 Took 15.518760 seconds. bsz (toks): 3866
I think this is generally the right direction. I can also take a look at comparisons of memory util between them, too. That said, printing out the number of trainable parameters shows that this seems to work well:
Tied: There are 25,588,799 trainable parameters.
Untied: There are 33,359,807 trainable parameters.

would you have a small enough task in mind ? It could be added to the examples and can be useful for sanity checking and perf regression catching.

erip · 2022-01-04T22:51:12Z

This example is somewhat involved (machine translation), but I could probably make something smaller. If that's of interest, I'm happy to try to contribute something!

blefaudeux · 2022-01-04T23:04:13Z

This example is somewhat involved (machine translation), but I could probably make something smaller. If that's of interest, I'm happy to try to contribute something!

just if it's not too much work ! There are two examples here if that helps. Also, you'll really need a GPU at some point :D

dianaml0

LGTM!

Option to pass in in/out projections

tentative implementation of #169

bdec208

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 3, 2022

blefaudeux marked this pull request as draft January 3, 2022 22:40

added unit testing

77fa504

blefaudeux force-pushed the weight_tying branch from 55b5ec2 to 77fa504 Compare January 3, 2022 23:12

blefaudeux requested review from jieru-hu, dianaml0 and fmassa January 4, 2022 01:34

blefaudeux marked this pull request as ready for review January 4, 2022 01:35

Improve on the doc

4c68c11

dianaml0 approved these changes Jan 5, 2022

View reviewed changes

blefaudeux merged commit 3422b41 into main Jan 5, 2022

blefaudeux deleted the weight_tying branch January 18, 2022 04:46

xwhan pushed a commit to xwhan/xformers that referenced this pull request Feb 8, 2022

Merge pull request facebookresearch#172 from fairinternal/gshard0

82b8ea1

Option to pass in in/out projections

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Embedding weight tying (#169) #172

[feature] Embedding weight tying (#169) #172

blefaudeux commented Jan 3, 2022 •

edited

Loading

codecov-commenter commented Jan 3, 2022

erip commented Jan 4, 2022 •

edited

Loading

erip commented Jan 4, 2022 •

edited

Loading

blefaudeux commented Jan 4, 2022

erip commented Jan 4, 2022 •

edited

Loading

blefaudeux commented Jan 4, 2022

dianaml0 left a comment

[feature] Embedding weight tying (#169) #172

[feature] Embedding weight tying (#169) #172

Conversation

blefaudeux commented Jan 3, 2022 • edited Loading

What does this PR do?

Before submitting

PR review

codecov-commenter commented Jan 3, 2022

Codecov Report

erip commented Jan 4, 2022 • edited Loading

erip commented Jan 4, 2022 • edited Loading

blefaudeux commented Jan 4, 2022

erip commented Jan 4, 2022 • edited Loading

blefaudeux commented Jan 4, 2022

dianaml0 left a comment

Choose a reason for hiding this comment

blefaudeux commented Jan 3, 2022 •

edited

Loading

erip commented Jan 4, 2022 •

edited

Loading

erip commented Jan 4, 2022 •

edited

Loading

erip commented Jan 4, 2022 •

edited

Loading