You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At around 53:10 of the lecture, Andrej does a matrix multiplication with tensors of size (T, T) and (B, T, C). More precisely: (8, 8) @ (4, 8, 2).
Now, even after looking over PyTorch docs on broadcasting semantics, I'm surprised to see that this works - but sure enough, running the code produces an output of (4, 8, 2).
Can anyone explain how this broadcast works?
// align trailing dimensions
8, 8
4, 8, 2
// pad missing dimensions with 1
1, 8, 8
4, 8, 2
// duplicate 1 dimensions until match
4, 8, 8
4, 8 ,2
// now what???
The text was updated successfully, but these errors were encountered:
I think the same reasoning also applies when using images, Tensors usually have shape [Batch, Channels, Height, Width] (NCHW), you can consider the image colors as group of different images, a batching dimension
It’s using the usual broadcasting rules, if you mean if it follows the matrix multiplication rule of having 1 dimension in common then yes, the dimension in common must the the second from the right
At around 53:10 of the lecture, Andrej does a matrix multiplication with tensors of size (T, T) and (B, T, C). More precisely: (8, 8) @ (4, 8, 2).
Now, even after looking over PyTorch docs on broadcasting semantics, I'm surprised to see that this works - but sure enough, running the code produces an output of (4, 8, 2).
Can anyone explain how this broadcast works?
The text was updated successfully, but these errors were encountered: