Changed things as mentioned in issue 6 #7

tatp22 · 2020-06-27T21:59:14Z

Changed some things in the code as mentioned here: #6

In particular, the following changes should be of note:

Every nn.Linear instantiation is now replaced by the get_linear function, which returns an nn.Linear with xavier init. This also affects how the E and F matrices were initialized.
There are no more w_q, w_k, and w_v matrices in the LinearAttentionHead module. Instead, in the MHAttention module, to_{q,k,v} is now a ModuleList, and there are nhead nn.Linear layers in each of them, each corresponding to the original weight matrix in the original paper.
Fixed a bug where there were still **kwargs in the checkpoint function "C2".

tatp22 added 2 commits June 27, 2020 23:48

Added changes as mentioned by pr 6

81ca792

Changed README

6b103b9

tatp22 linked an issue Jun 27, 2020 that may be closed by this pull request

Composed linear layers? #6

Closed

tatp22 merged commit 7c5c3a0 into master Jun 28, 2020

tatp22 deleted the linear_change branch June 28, 2020 00:06

Provide feedback