Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message passing operation #59

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Message passing operation #59

wants to merge 18 commits into from

Conversation

raimis
Copy link
Contributor

@raimis raimis commented Jun 7, 2022

@raimis raimis self-assigned this Jun 7, 2022
@raimis raimis mentioned this pull request Aug 2, 2022
2 tasks
@raimis raimis marked this pull request as ready for review August 19, 2022 15:26
@raimis raimis requested a review from peastman August 19, 2022 15:26
@raimis
Copy link
Contributor Author

raimis commented Aug 19, 2022

@peastman could you review?

details.
messages: `torch.Tensor`
Atom pair messages. The shape of the tensor is `(num_pairs, num_features)`.
For efficient, `num_features` has to be a multiple of 32 and <= 1024.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those limitations really necessary? It's very common for the number of features not to be a multiple of 32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrary, the number of internal features is always factor of 32 (e.g. in TorchMD-NET, I have seen usage of 64, 128, 256). GPU computes in warps of 32 threads, so it is the best to match that patter for computational efficiency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I frequently create models that don't satisfy those requirements, including in TorchMD-Net. For example, I've trained models with 48 or 80 features per layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your question. Why not? The number of features is a hyperparameter. It's one of many hyperarameters you tune to balance training accuracy, overfitting, speed, and memory use. Why place arbitrary limits on it when there's no need to?

Comment on lines +27 to +28
const int32_t i_feat = threadIdx.x;
atomicAdd(&new_states[i_atom][i_feat], messages[i_neig][i_feat]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can eliminate the limitations on number of features by just rewriting this as a loop.

for (int32_t i_feat = threadIdx.x; i_feat < num_features; i_feat += blockDim.x)
    atomicAdd(&new_states[i_atom][i_feat], messages[i_neig][i_feat]);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from solving a non-existing problem, this would make the memory access not coalesced and reduce speed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have no effect on speed at all. If the number of features happens to satisfy your current requirement, the behavior would be identical to what it currently does. The atomicAdd() would be executed once by every thread with i_feat equal to threadIdx.x. The only change would be if the number doesn't satisfy your current requirements, either because it's not a multiple of 32 or it's more than 1024. In that case it would produce correct behavior, unlike the current code. So there's no downside at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will reduce the number of thread by the number of features. The reduced parallelism would result into reduced speed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting any change to the number of threads. The only thing I'm suggesting is wrapping the atomicAdd() in a loop as shown above. If num_features happens to match your current restrictions, nothing will change. Every thread will still call it exactly once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants