[Feature Request] Compute prod_env_mat
OP in parallel in the dimension of the frame
#2618
Labels
prod_env_mat
OP in parallel in the dimension of the frame
#2618
Summary
Currently, the
prod_env_mat
OP and its kernel are only parallelized in the dimension of the atoms and are not parallelized in the dimension of the frame. This is not a problem if the training batch size is small or if running MD simulations, but it causes performance degradation when the training batch size is large or for inference (dp test
anddp model-devi
) on modern GPUs that have a large memory.In #2600 and #2601, I refactored
prod_force
andprod_force_grad
. A similar thing should be applied toprod_env_mat
.Detailed Description
The current code is:
deepmd-kit/source/op/prod_env_mat_multi_device.cc
Lines 1150 to 1151 in 92ca097
This loop should be avoided for at least GPUs.
Further Information, Files, and Links
No response
The text was updated successfully, but these errors were encountered: