Store coordinates together with the points in a cell list #52
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I realized that the
FullGridCellList
(at least with theVector{Vector}
backend) is algorithmically almost identical to the method of CellListMap.jl. But CellListMap.jl is slightly faster on a single thread or on multiple threads when modified to use Polyester.jl for threading.The main difference is that @lmiq is storing the coordinates together with the point indices in the cell lists. This avoids unordered access of the big coordinate array to get the coordinates of the neighbor.
I implemented a similar data structure and made it configurable, as our goal is to have a playground to try out methods.
We now get very similar performance to CellListMap.jl.
Here is a plot showing the speedup against CellListMap.jl on a single thread (Threadripper 3990X):
On 128 threads, we're still slightly slower:
Here is a plot showing the speedup from using
PointWithCoordinates
on different architectures.We see the largest speedups (14-15% for a WCSPH interaction on 128 threads!) on the CPU. The Nvidia H100 is also benefiting from this data structure. The RTX 3090 is only getting 0.5-1% faster. For some reason, the AMD Instinct MI210 doesn't like this data structure at all and is performing 2x slower.