Clarify shape descriptions inside forward method #9

Andrei-Aksionov · 2023-02-07T12:16:38Z

I think that shape descriptions in forward method are somewhat misleading.
The input has size (B, T, C) and then this C is used to describe shape on every step during forward pass, where in fact the last dimension is actually head_size. C can be equal to head_size (if for example the number of heads is 1), but it's not guaranteed.

In addition the shape of projection layer should have (head_size * num_heads, n_embd) as input and output dimensions respectively. This will make it more robust. In this code the size of head is equal to n_embd // num_heads, but it's not a strict rule and we might want to not reduce the size of a single head (by a factor of num_heads) and as a result the last dimension of concatenated heads will be larger than embedding size. Like it's shown in this article.

karpathy · 2023-02-07T21:32:07Z

you're right, i like it, ty

Andrei-Aksionov · 2023-02-07T21:46:05Z

This is my very first merged PR to an open source project. Feels nice 🙂

And I want to use my 3 seconds of fame to say big thank you for the videos that you've made so far and I hope you are not planning to stop.

…ription_fix Clarify shape descriptions inside forward method

Clarify shape descriptions inside forward method

4c8e902

karpathy merged commit 5220142 into karpathy:master Feb 7, 2023

huiqiao pushed a commit to huiqiao/char-level-gpt that referenced this pull request Mar 15, 2024

Merge pull request karpathy#9 from Andrei-Aksionov/feature/shape_desc…

0fee733

…ription_fix Clarify shape descriptions inside forward method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify shape descriptions inside forward method #9

Clarify shape descriptions inside forward method #9

Andrei-Aksionov commented Feb 7, 2023

karpathy commented Feb 7, 2023

Andrei-Aksionov commented Feb 7, 2023

Clarify shape descriptions inside forward method #9

Clarify shape descriptions inside forward method #9

Conversation

Andrei-Aksionov commented Feb 7, 2023

karpathy commented Feb 7, 2023

Andrei-Aksionov commented Feb 7, 2023