Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify shape descriptions inside forward method #9

Merged

Conversation

Andrei-Aksionov
Copy link
Contributor

I think that shape descriptions in forward method are somewhat misleading.
The input has size (B, T, C) and then this C is used to describe shape on every step during forward pass, where in fact the last dimension is actually head_size. C can be equal to head_size (if for example the number of heads is 1), but it's not guaranteed.

In addition the shape of projection layer should have (head_size * num_heads, n_embd) as input and output dimensions respectively. This will make it more robust. In this code the size of head is equal to n_embd // num_heads, but it's not a strict rule and we might want to not reduce the size of a single head (by a factor of num_heads) and as a result the last dimension of concatenated heads will be larger than embedding size. Like it's shown in this article.

@karpathy karpathy merged commit 5220142 into karpathy:master Feb 7, 2023
@karpathy
Copy link
Owner

karpathy commented Feb 7, 2023

you're right, i like it, ty

@Andrei-Aksionov
Copy link
Contributor Author

This is my very first merged PR to an open source project. Feels nice 🙂

And I want to use my 3 seconds of fame to say big thank you for the videos that you've made so far and I hope you are not planning to stop.

huiqiao pushed a commit to huiqiao/char-level-gpt that referenced this pull request Mar 15, 2024
…ription_fix

Clarify shape descriptions inside forward method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants