Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wei value not 100% per row after dropout #30

Open
guyko81 opened this issue Sep 7, 2023 · 1 comment
Open

wei value not 100% per row after dropout #30

guyko81 opened this issue Sep 7, 2023 · 1 comment

Comments

@guyko81
Copy link

guyko81 commented Sep 7, 2023

It doesn't make sense to me, but

        wei = q @ k.transpose(-2,-1) * k.shape[-1]**-0.5 # (B, T, hs) @ (B, hs, T) -> (B, T, T)
        wei = wei.masked_fill(self.tril[:T, :T] == 0, float('-inf')) # (B, T, T)
        wei = F.softmax(wei, dim=-1) # (B, T, T)

although after this step the row level percentages sum up to 100%, taking the dropout

        wei = self.dropout(wei)

the values increase above 100%. Any reason for that? Does it cause any issues? I mean the overall calculation shouldn't be effected too much, other parts of the network can overcome this issue, but still.

@fasterinnerlooper
Copy link

fasterinnerlooper commented Feb 4, 2024

I'm running this as a Jupyter Notebook, so there might be some inconsistencies, but I don't appear to be getting this when I intercept wei and check its values. Can you maybe provide some more detail around this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants