Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nearest neighbor operation #58

Merged
merged 22 commits into from
Jun 14, 2022
Merged

Nearest neighbor operation #58

merged 22 commits into from
Jun 14, 2022

Conversation

raimis
Copy link
Contributor

@raimis raimis commented May 18, 2022

@raimis raimis self-assigned this May 18, 2022
@raimis raimis changed the title Move code from TorchMD-NET Nearest neighbor operation May 18, 2022
@raimis raimis marked this pull request as ready for review June 7, 2022 13:08
@raimis raimis requested a review from peastman June 7, 2022 13:09
@raimis
Copy link
Contributor Author

raimis commented Jun 7, 2022

@peastman could you review?

The PBC (torchmd/torchmd-net#92) will be add with a separate PR.

raimis pushed a commit to raimis/NNPOps that referenced this pull request Jun 7, 2022
@raimis raimis mentioned this pull request Jun 7, 2022
4 tasks
Copy link
Member

@peastman peastman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! My comments at torchmd/torchmd-net#61 (comment) still apply. For very small molecules, this should be as fast as anything. Once you get up to several hundred atoms, there are ways it could be made faster. Having each thread loop over multiple atoms or having multiple threads work together could reduce the number of row/column computations, position loads, and atomic adds.

Comment on lines 9 to 11
If `max_num_neighbors == -1` (default), all the atom pairs are returned,
(i.e. `num_atoms * (num_atoms + 1) / 2` pairs). If `max_num_neighbors > 0`,
only `num_atoms * max_num_neighbors` pairs are returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit ambiguous. Is it saying that if max_num_neighbors is -1 then the cutoff is ignored and all pairs are returned regardless of distance? Or is it simply returning the result in a different shape tensor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved the text and added a few example.

-------
neighbors: `torch.Tensor`
Atom pair indices. The shape of the tensor is `(2, num_pairs)`.
If an atom pair is separated by a larger distance than the cutoff,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is num_pairs? Does it depend on the data? Or is it always the same regardless of positions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined num_pairs

template<> __device__ __forceinline__ float sqrt_(float x) { return ::sqrtf(x); };
template<> __device__ __forceinline__ double sqrt_(double x) { return ::sqrt(x); };

// Support pre-Kepler GPUs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's serious dedication to maintaining old hardware! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we support CUDA 11, we have to support these cards.

@raimis
Copy link
Contributor Author

raimis commented Jun 14, 2022

My comments at torchmd/torchmd-net#61 (comment) still apply. For very small molecules, this should be as fast as anything. Once you get up to several hundred atoms, there are ways it could be made faster. Having each thread loop over multiple atoms or having multiple threads work together could reduce the number of row/column computations, position loads, and atomic adds.

For molecules of 100-200 atoms, the speeds improve with more parallelization exposed, even it means some duplicate calculations. For larger molecules, maybe it is possible to improve a bit as you suggest, but the scaling of the current algorithm is quadratic. So it won't go far.

I thing, a better approach is two write another neighbor kernel with the cell list (some preliminary work torchmd/torchmd-net#68), which will be more efficient for large molecule and leave this one optimized for small ones.

@raimis
Copy link
Contributor Author

raimis commented Jun 14, 2022

@peastman any more comments?

@peastman
Copy link
Member

Looks good.

@raimis raimis merged commit 8b8ba43 into openmm:master Jun 14, 2022
@raimis raimis mentioned this pull request Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants