Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed things as mentioned in issue 6 #7

Merged
merged 2 commits into from
Jun 28, 2020
Merged

Changed things as mentioned in issue 6 #7

merged 2 commits into from
Jun 28, 2020

Conversation

tatp22
Copy link
Owner

@tatp22 tatp22 commented Jun 27, 2020

Changed some things in the code as mentioned here: #6

In particular, the following changes should be of note:

  • Every nn.Linear instantiation is now replaced by the get_linear function, which returns an nn.Linear with xavier init. This also affects how the E and F matrices were initialized.

  • There are no more w_q, w_k, and w_v matrices in the LinearAttentionHead module. Instead, in the MHAttention module, to_{q,k,v} is now a ModuleList, and there are nhead nn.Linear layers in each of them, each corresponding to the original weight matrix in the original paper.

  • Fixed a bug where there were still **kwargs in the checkpoint function "C2".

@tatp22 tatp22 linked an issue Jun 27, 2020 that may be closed by this pull request
@tatp22 tatp22 merged commit 7c5c3a0 into master Jun 28, 2020
@tatp22 tatp22 deleted the linear_change branch June 28, 2020 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Composed linear layers?
1 participant