What pretrained models using SwiGLU are available? #2184
-
What pretrained models using SwiGLU or SiLU are available? Taking a look, I could find DINO2, EVA2 and There is support for SwiGLU for BEIT, but weights don't seem to be available? Are there any results comparing the usefulness of SwiGLU to normal MLP? What are the relevant papers? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@NightMachinery I think you covered the main models /w weights, it's possible to activate on any of the vit models though with an argument override. It does yield an improvement, but it also is more expensive since it does end up increasing the size of the MLP and especially the memory required for acvitations, some perf can be regained if you use a fused impl like the one in xformers but I don't have that a s a dependency. There is a short paper that compares some gated activations https://arxiv.org/abs/2002.05202 |
Beta Was this translation helpful? Give feedback.
@NightMachinery I think you covered the main models /w weights, it's possible to activate on any of the vit models though with an argument override.
It does yield an improvement, but it also is more expensive since it does end up increasing the size of the MLP and especially the memory required for acvitations, some perf can be regained if you use a fused impl like the one in xformers but I don't have that a s a dependency.
There is a short paper that compares some gated activations https://arxiv.org/abs/2002.05202