Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed Distance fails when processing multiple STL files sequentially with shared memory #220

Closed
gzagaris opened this issue Apr 8, 2020 · 1 comment · Fixed by #245
Closed
Assignees
Labels
bug Something isn't working memory mpi Related to MPI communication Quest Issues related to Axom's 'quest' component User Request Issues related to user requests

Comments

@gzagaris
Copy link
Member

gzagaris commented Apr 8, 2020

Andy Cook has an application that:

  • Reads in multiple STL files and calls signed distance on them one at a time
  • In the case of AMR, the STL file would be read multiple times, e.g., for each level

The STL files are read and processed sequentially, i.e., there is signed_distance_finalize() before proceeding to the next file or level.

This works when MPI shared memory is OFF. When shared memory is enabled, signed_distance_init() for the next file returns a failed status.

@gzagaris gzagaris added bug Something isn't working Quest Issues related to Axom's 'quest' component User Request Issues related to user requests labels Apr 8, 2020
@gzagaris gzagaris self-assigned this Apr 8, 2020
@gzagaris gzagaris added the mpi Related to MPI communication label May 5, 2020
@gzagaris
Copy link
Member Author

gzagaris commented May 6, 2020

This turned out to be just a matter of setting the pointer to the shared_buffer associated with the shared MPI window to nullptr in finalize().

Note, We do call MPI_Win_free, which deallocates the buffer, but we don't explicitly set the pointer used in the code to nullptr and throw a SLIC_ERROR in subsequent calls.

I'll push a fix shortly.

gzagaris added a commit that referenced this issue May 6, 2020
This commit adds a regression test which calls quest's
signed distance query twice, using MPI shared memory.
This exploits a bug in finalize(), which, doesn't clear
the shared buffer appropriately and consequently,
subsequent calls to quest::initialize() fail.

Github Issue: #220
gzagaris added a commit that referenced this issue May 6, 2020
When enabling MPI-3 shared memory for the signed distance query,
signed_distance_finalize() deallocates the on-node shared buffer
by MPI_Win_free(). However, the value of the pointer used to
access that buffer internally in the code was not being properly
set to null. Consequently, subsequent calls to quest::initialize()
would fail.

This resolves #220.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working memory mpi Related to MPI communication Quest Issues related to Axom's 'quest' component User Request Issues related to user requests
Projects
None yet
2 participants