-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DofMap::reinit_send_list(), Overlapping Coupling Test #2088
Conversation
Annnnddd I realize now that the reason it the case was failing was that the FEMContext will try and build every variable defined on the element, not just the ones I asked for coupling with. I'll tinker with this some more later this week to make the testing more precise to see if I can figure why my solutions are crap in the application and reflect that in this test. Then can talk about merging. |
Note that this will require libMesh/libmesh#2088 to get the DofMap::reinit_send_list() function.
Note that this will require libMesh/libmesh#2088 to get the DofMap::reinit_send_list() function.
src/base/dof_map.C
Outdated
{ | ||
this->clear_send_list(); | ||
this->add_neighbors_to_send_list(mesh); | ||
this->add_constraints_to_send_list(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I finally got a working case. Most of it was user error, but this part was incomplete. If I changed this to scatter_constraints(mesh)
it worked for my case (the failure manifest as an incorrect solve for the Newton step. Not sure how we're going to unit test that...). But I think that's a really special case and I think the correct thing to do here call process_constraints()
. However, that assumes the user hasn't added any new constraints and they've just changed the ghosting. I will certainly document that, but do we want to do something else, e.g. have a reinit_constraints_and_send_list()
method that would call reinit_constraints()
instead of just process_constraints
? Thoughts @roystgnr? Once we settle this, I'll clean this up for merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I said the wrong thing. The first alternative I tried was
this->scatter_constraints(mesh); this->add_constraints_to_send_list();
i.e. added the scatter_constraints(mesh)
call. Then, I replaced both of those with this->process_constraints(mesh)
. Both work for my case, but I think the latter is more complete. My case only needs to send the constraints to other processors, not gather them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the failure manifest as an incorrect solve for the Newton step. Not sure how we're going to unit test that...
Test the residual magnitude after enough steps that we should be within tolerance? Quadratic vs non-quadratic convergence should be different enough that we can be loose with tolerances.
do we want to do something else, e.g. have a reinit_constraints_and_send_list() method that would call reinit_constraints() instead of just process_constraints?
That might not be a bad idea, but I'd want to see a use case first. Usually people who have constraints turning on and off on the fly are going to have inequality constraints (which will have to be handled via the formulation) rather than equality constraints (which can go into our current infrastructure), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might not be a bad idea, but I'd want to see a use case first.
Heh, that's a good point. I'm not sure I can come up with one. I'll stick with just this for now and document it and if the need ever arises, we can add that API for it.
832dc13
to
d8effdc
Compare
OK, this is ready for formal review, I think. I added some documentation to the The unit testing is necessary, but not sufficient. It won't capture the error I was seeing when we didn't call |
include/base/ghosting_functor.h
Outdated
@@ -130,6 +130,20 @@ class Elem; | |||
* (e.g. subdomain-restricted variables on a neighboring subdomain) | |||
* will be unaffected. | |||
* | |||
* Typical usage of the GhostingFunctor would be to add the appropriate | |||
* functor before EquationSystems::init() is called. However, in some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually misleading - algebraic and coupling functors should be added before EquationSystems::init(), but geometric ghosting functors are ideally added before the Mesh is filled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geometric ghosting functors are ideally added before the Mesh is filled.
Meaning before prepare_for_use()
? Sorry, I'm only messing with the algebraic/coupling functors so I'm less sure about the geometry functors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the right meaning, but I'm not sure of a better way to say it than "filled", since libMesh mesh generation idioms typically do prepare_for_use() implicitly.
include/base/dof_map.h
Outdated
|
||
/** | ||
* Clears the \p _send_list vector and then rebuilds it. This may be needed | ||
* for special situations for example when an algebraic coupling functor cannot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as I'm making you change comments: comma after "situations"
I pushed two comment commits that I'll squash in once someone takes a look. Good to merge after that? |
Note that this will require libMesh/libmesh#2088 to get the DofMap::reinit_send_list() function.
I added a DistributedMesh sweep to be paranoid; 👍 once that's done, thanks! |
Well, yay for paranoia. |
OK, it was because I didn't use the replicated mesh header, just As an aside, I do plan to come back to the unit in the nearish future to update to support |
fcd4d56
to
7e985ce
Compare
I cannot locally reproduce the segfault the distributed mesh test is showing. |
Oops, I lied (missed the mpiexec -np 2 part). I can reproduce. |
Hmmm. This doesn't look like me anymore. Here's the output when running the devel unit test: [02:18:02][pbauman@fry:/fry1/data/users/pbauman/research/libmesh/build/clear-send-list/tests]$ mpiexec -np 2 ./unit_tests-devel --option-with-dashes --option_with_underscores 3 Will run the following tests: All Tests .....Constraint loop detected Constraint loop detected [0] ../../pbauman/src/base/dof_map_constraints.C, line 3482, compiled Apr 5 2019 at 14:07:33 [1] ../../pbauman/src/base/dof_map_constraints.C, line 3482, compiled Apr 5 2019 at 14:07:33 ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Assertion `ghost_objects_from_proc[p] <= objects_on_proc[p]' failed. ghost_objects_from_proc[p] = 5 objects_on_proc[p] = 0 [1] ../../pbauman/src/mesh/distributed_mesh.C, line 1054, compiled Apr 5 2019 at 13:44:14 Assertion `(comm).verify(std::string("./include/libmesh/parallel_sync.h").size())' failed. Assertion `(this->comm()).verify(std::string("../../pbauman/src/mesh/mesh_base.C").size())' failed. [0] ./include/libmesh/parallel_sync.h, line 245, compiled Apr 5 2019 at 13:44:14 E.[1] ../../pbauman/src/mesh/mesh_base.C, line 160, compiled Apr 5 2019 at 14:07:44 Assertion `this->comm().verify(unpartitioned_objects)' failed. Assertion `this->comm().verify(unpartitioned_objects)' failed. [0] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr 5 2019 at 13:44:14 E.[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr 5 2019 at 13:44:14 Assertion `this->comm().verify(unpartitioned_objects)' failed. Assertion `this->comm().verify(unpartitioned_objects)' failed. [0] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr 5 2019 at 13:44:14 E.[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr 5 2019 at 13:44:14 Assertion `this->comm().verify(unpartitioned_objects)' failed. Assertion `this->comm().verify(unpartitioned_objects)' failed. [0] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr 5 2019 at 13:44:14 E.[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr 5 2019 at 13:44:14 Assertion `this->comm().verify(unpartitioned_objects)' failed. Assertion `this->comm().verify(unpartitioned_objects)' failed. [0] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr 5 2019 at 13:44:14 E.[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr 5 2019 at 13:44:14 In CheckpointIO::write, directory 'checkpoint_splitter.cpa/2' already exists, overwriting contents.../../pbauman/src/mesh/checkpoint_io.C, line 123, compiled Apr 5 2019 at 13:44:13 *** Assertion `unique_id == old_node->unique_id()' failed. unique_id = 2 old_node->unique_id() = 20 [0] ../../pbauman/src/mesh/checkpoint_io.C, line 1003, compiled Apr 5 2019 at 13:44:13 E.Assertion `(this->comm()).verify(std::string("../../pbauman/src/mesh/mesh_base.C").size())' failed. Assertion `(mesh.comm()).verify(std::string("../../pbauman/src/mesh/mesh_communication.C").size())' failed. [0] ../../pbauman/src/mesh/mesh_base.C, line 160, compiled Apr 5 2019 at 14:07:44 E.[1] ../../pbauman/src/mesh/mesh_communication.C, line 1101, compiled Apr 5 2019 at 14:07:47 ........Warning: This MeshOutput subclass only supports meshes which have been serialized! Warning: This MeshOutput subclass only supports meshes which have been serialized! .....................Warning! Parallel xda/xdr is not yet implemented. Writing a serialized file instead. ........................................................................................................................................................................................................ERROR: The library has been built without Space Filling Curve support. Using a linear partitioner instead! .....................................................................................................................................................................................ERROR: Unsupported PETSC Solver: ERROR: Unsupported PETSC Solver: JACOBI Continuing with PETSC defaults JACOBI Continuing with PETSC defaults ........... !!!FAILURES!!! Test Results: Run: 1173 Failures: 0 Errors: 7 1) test: BoundaryRefinedMeshTest::testMesh (E) setUp() failed - uncaught exception of type libMesh::LogicError - 2) test: BoundaryMeshTest::testMesh (E) setUp() failed - uncaught exception of type libMesh::LogicError - 3) test: BoundaryInfoTest::testMesh (E) uncaught exception of type libMesh::LogicError - 4) test: BoundaryInfoTest::testShellFaceConstraints (E) uncaught exception of type libMesh::LogicError - 5) test: BoundaryInfoTest::testEdgeBoundaryConditions (E) uncaught exception of type libMesh::LogicError - 6) test: CheckpointIOTest::testAsciiDistRepSplitter (E) uncaught exception of type libMesh::LogicError - 7) test: CheckpointIOTest::testBinaryDistRepSplitter (E) uncaught exception of type libMesh::LogicError - |
I'm trying with master now to see if it's there as well. |
I've got at least one case fixed; I'll put in a new PR with that and see what else we hit. |
OK, just confirmed I'm seeing this in the current master. |
Add an API hook for manually clearing the send_list and use this API instead of internally manually calling _send_list.clear().
This is an API hook to reinitialize the send list. This may be necessary for example when a user can only add an algebraic or coupling GhostingFunctor after the DofMap has been initialized. Such scenarios arise then the GhostingFunctor requires the evaluation of solution fields in order to determine ghosting/coupling.
This is going to evolve into a full blown unit test of a specific type of GhostingFunctor. For now, just adding the coupling functor that will be the basis of the test.
This includes things like building up the simple mesh for testing and initialing the test system. Subclasses that actually do the testing will leverage all this. fixup common infrastructure/build mesh
This is sanity checking the OverlappingCouplingFunctor assumptions about the mesh and that we're doing what we think we're doing.
We'll use this special partitioner to ensure that the elements in subdomain one are on a different processor than the elements of subdomain two so that we necessarily force ghosting when doing algebraic or coupling ghosting between the two subdomains.
This test exercises algebraic coupling by trying to build a FEMContext for the part that should have been communicated. If the algebraic ghosting didn't work correctly, this test will error because it will not be able to evaluate the ghosted solution dofs.
This is the test for the correct sparsity pattern. The test for correctness at this point is seeing if PETSc throws an extra malloc error or not. Would be nice to make this a richer test.
Added some documentation on what additional steps would be needed if the user can't attach an algebraic or coupling GhostingFunctor to the DofMap until after the system has been initialized.
7e985ce
to
1c93a57
Compare
OK, rebased on master after #2097 was merged. Once everything passes, I'll go ahead and merge. |
Note that this will require libMesh/libmesh#2088 to get the DofMap::reinit_send_list() function.
DO NOT MERGE!
This is exposing a failing use case as a unit test to help decipher what's going on.
The first two commits are additions to the DofMap API to facilitate the following use case of the GhostingFunctor. (The physical context for the following is fluid-structure interaction.)
Suppose I have two subdomains that are not topologically connected, but overlap each other. I have different, subdomain-restricted variables declared on each subdomain. The variables on the subdomains are coupled. So I want to use the GhostingFunctor interface to express the algebraic ghosting and the coupling ghosting. However, the ghosting that I want to generate depends on the solution (think deformed solid). So I have to be able to evaluate the solution when the GhostingFunctor is going to be called. Which means that I can't attach the GhostingFunctor until after the System has been initialized. So the idea is to add the DofMap API so that I can attach the GhostingFunctor after the system is initialized, call DofMap::reinit_send_list(), and then reinitialize any ghosted vectors and the sparsity pattern (if doing coupling ghosting).
In my application, I make the following observations:
So this WIP explicitly shows issue 1. (I'm hoping if we fix issue 1, issue 2 will also magically get fixed). It fails on the simple 3 element mesh I construct by hand in the test and on 2 (or more) processors. Everything passes on one processor.
The unit test is a little involved (I need the GhostingFunctor mimicing my use case, a Partitioner to make sure it will always trip the algebraic ghosting error by making sure the subdomains always sit on different processors (which gets its own unit testing) and then the actual testing that triggers failures), but it does mimic my use case pretty closely. I'll enrich it a bit more and can clean it up, but wanted to get this put up now so that it at least exposes the failing case.