Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DofMap::reinit_send_list(), Overlapping Coupling Test #2088

Merged
merged 13 commits into from
Apr 7, 2019

Conversation

pbauman
Copy link
Member

@pbauman pbauman commented Mar 26, 2019

DO NOT MERGE!

This is exposing a failing use case as a unit test to help decipher what's going on.

The first two commits are additions to the DofMap API to facilitate the following use case of the GhostingFunctor. (The physical context for the following is fluid-structure interaction.)

Suppose I have two subdomains that are not topologically connected, but overlap each other. I have different, subdomain-restricted variables declared on each subdomain. The variables on the subdomains are coupled. So I want to use the GhostingFunctor interface to express the algebraic ghosting and the coupling ghosting. However, the ghosting that I want to generate depends on the solution (think deformed solid). So I have to be able to evaluate the solution when the GhostingFunctor is going to be called. Which means that I can't attach the GhostingFunctor until after the System has been initialized. So the idea is to add the DofMap API so that I can attach the GhostingFunctor after the system is initialized, call DofMap::reinit_send_list(), and then reinitialize any ghosted vectors and the sparsity pattern (if doing coupling ghosting).

In my application, I make the following observations:

  1. If I pass an explicit CouplingMatrix, the algebraic ghosting appears to be incorrect.
  2. If I pass a null CouplingMatrix, the algebraic ghosting does not throw an error, but I get incorrect results after a linear solve for certain processor counts.

So this WIP explicitly shows issue 1. (I'm hoping if we fix issue 1, issue 2 will also magically get fixed). It fails on the simple 3 element mesh I construct by hand in the test and on 2 (or more) processors. Everything passes on one processor.

The unit test is a little involved (I need the GhostingFunctor mimicing my use case, a Partitioner to make sure it will always trip the algebraic ghosting error by making sure the subdomains always sit on different processors (which gets its own unit testing) and then the actual testing that triggers failures), but it does mimic my use case pretty closely. I'll enrich it a bit more and can clean it up, but wanted to get this put up now so that it at least exposes the failing case.

@pbauman
Copy link
Member Author

pbauman commented Mar 26, 2019

Annnnddd I realize now that the reason it the case was failing was that the FEMContext will try and build every variable defined on the element, not just the ones I asked for coupling with. I'll tinker with this some more later this week to make the testing more precise to see if I can figure why my solutions are crap in the application and reflect that in this test. Then can talk about merging.

pbauman added a commit to pbauman/grins that referenced this pull request Apr 2, 2019
Note that this will require libMesh/libmesh#2088 to get the
DofMap::reinit_send_list() function.
pbauman added a commit to pbauman/grins that referenced this pull request Apr 3, 2019
Note that this will require libMesh/libmesh#2088 to get the
DofMap::reinit_send_list() function.
{
this->clear_send_list();
this->add_neighbors_to_send_list(mesh);
this->add_constraints_to_send_list();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I finally got a working case. Most of it was user error, but this part was incomplete. If I changed this to scatter_constraints(mesh) it worked for my case (the failure manifest as an incorrect solve for the Newton step. Not sure how we're going to unit test that...). But I think that's a really special case and I think the correct thing to do here call process_constraints(). However, that assumes the user hasn't added any new constraints and they've just changed the ghosting. I will certainly document that, but do we want to do something else, e.g. have a reinit_constraints_and_send_list() method that would call reinit_constraints() instead of just process_constraints? Thoughts @roystgnr? Once we settle this, I'll clean this up for merging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I said the wrong thing. The first alternative I tried was

this->scatter_constraints(mesh);
this->add_constraints_to_send_list();

i.e. added the scatter_constraints(mesh) call. Then, I replaced both of those with this->process_constraints(mesh). Both work for my case, but I think the latter is more complete. My case only needs to send the constraints to other processors, not gather them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the failure manifest as an incorrect solve for the Newton step. Not sure how we're going to unit test that...

Test the residual magnitude after enough steps that we should be within tolerance? Quadratic vs non-quadratic convergence should be different enough that we can be loose with tolerances.

do we want to do something else, e.g. have a reinit_constraints_and_send_list() method that would call reinit_constraints() instead of just process_constraints?

That might not be a bad idea, but I'd want to see a use case first. Usually people who have constraints turning on and off on the fly are going to have inequality constraints (which will have to be handled via the formulation) rather than equality constraints (which can go into our current infrastructure), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might not be a bad idea, but I'd want to see a use case first.

Heh, that's a good point. I'm not sure I can come up with one. I'll stick with just this for now and document it and if the need ever arises, we can add that API for it.

@pbauman
Copy link
Member Author

pbauman commented Apr 4, 2019

OK, this is ready for formal review, I think. I added some documentation to the GhostingFunctor in the last commit to try and spell out what needs to be done in use cases like this one. Please also check that the documentation for DofMap::reinit_send_list is adequate.

The unit testing is necessary, but not sufficient. It won't capture the error I was seeing when we didn't call DofMap::process_constraints(), but I'm hoping that a future GRINS test will capture that. This unit test does catch incomplete algebraic ghosting and incomplete sparsity pattern updates.

@@ -130,6 +130,20 @@ class Elem;
* (e.g. subdomain-restricted variables on a neighboring subdomain)
* will be unaffected.
*
* Typical usage of the GhostingFunctor would be to add the appropriate
* functor before EquationSystems::init() is called. However, in some
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually misleading - algebraic and coupling functors should be added before EquationSystems::init(), but geometric ghosting functors are ideally added before the Mesh is filled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geometric ghosting functors are ideally added before the Mesh is filled.

Meaning before prepare_for_use()? Sorry, I'm only messing with the algebraic/coupling functors so I'm less sure about the geometry functors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the right meaning, but I'm not sure of a better way to say it than "filled", since libMesh mesh generation idioms typically do prepare_for_use() implicitly.


/**
* Clears the \p _send_list vector and then rebuilds it. This may be needed
* for special situations for example when an algebraic coupling functor cannot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as I'm making you change comments: comma after "situations"

@pbauman
Copy link
Member Author

pbauman commented Apr 5, 2019

I pushed two comment commits that I'll squash in once someone takes a look. Good to merge after that?

pbauman added a commit to pbauman/grins that referenced this pull request Apr 5, 2019
Note that this will require libMesh/libmesh#2088 to get the
DofMap::reinit_send_list() function.
@roystgnr
Copy link
Member

roystgnr commented Apr 5, 2019

I added a DistributedMesh sweep to be paranoid; 👍 once that's done, thanks!

@roystgnr
Copy link
Member

roystgnr commented Apr 5, 2019

Well, yay for paranoia.

@pbauman
Copy link
Member Author

pbauman commented Apr 5, 2019

OK, it was because I didn't use the replicated mesh header, just mesh.h. I'll squash that fix in (after testing with --enable-parmesh locally) and squash in those documentation commits and merge once all CIVET's are happy.

As an aside, I do plan to come back to the unit in the nearish future to update to support DistributedMesh (i.e. figure out how to tinker with geometric ghosting) once we have our use case fully up and running on ReplicatedMesh.

@pbauman
Copy link
Member Author

pbauman commented Apr 5, 2019

I cannot locally reproduce the segfault the distributed mesh test is showing.

@pbauman
Copy link
Member Author

pbauman commented Apr 5, 2019

Oops, I lied (missed the mpiexec -np 2 part). I can reproduce.

@pbauman
Copy link
Member Author

pbauman commented Apr 5, 2019

Hmmm. This doesn't look like me anymore. Here's the output when running the devel unit test:

[02:18:02][pbauman@fry:/fry1/data/users/pbauman/research/libmesh/build/clear-send-list/tests]$ mpiexec -np 2 ./unit_tests-devel --option-with-dashes --option_with_underscores 3
Will run the following tests:
All Tests
.....Constraint loop detected
Constraint loop detected
[0] ../../pbauman/src/base/dof_map_constraints.C, line 3482, compiled Apr  5 2019 at 14:07:33
[1] ../../pbauman/src/base/dof_map_constraints.C, line 3482, compiled Apr  5 2019 at 14:07:33
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Assertion `ghost_objects_from_proc[p] <= objects_on_proc[p]' failed.
ghost_objects_from_proc[p] = 5
objects_on_proc[p] = 0


[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1054, compiled Apr  5 2019 at 13:44:14
Assertion `(comm).verify(std::string("./include/libmesh/parallel_sync.h").size())' failed.

Assertion `(this->comm()).verify(std::string("../../pbauman/src/mesh/mesh_base.C").size())' failed.

[0] ./include/libmesh/parallel_sync.h, line 245, compiled Apr  5 2019 at 13:44:14
E.[1] ../../pbauman/src/mesh/mesh_base.C, line 160, compiled Apr  5 2019 at 14:07:44
Assertion `this->comm().verify(unpartitioned_objects)' failed.

Assertion `this->comm().verify(unpartitioned_objects)' failed.

[0] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr  5 2019 at 13:44:14
E.[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr  5 2019 at 13:44:14
Assertion `this->comm().verify(unpartitioned_objects)' failed.

Assertion `this->comm().verify(unpartitioned_objects)' failed.

[0] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr  5 2019 at 13:44:14
E.[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr  5 2019 at 13:44:14
Assertion `this->comm().verify(unpartitioned_objects)' failed.

Assertion `this->comm().verify(unpartitioned_objects)' failed.

[0] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr  5 2019 at 13:44:14
E.[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr  5 2019 at 13:44:14
Assertion `this->comm().verify(unpartitioned_objects)' failed.

Assertion `this->comm().verify(unpartitioned_objects)' failed.

[0] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr  5 2019 at 13:44:14
E.[1] ../../pbauman/src/mesh/distributed_mesh.C, line 1051, compiled Apr  5 2019 at 13:44:14
In CheckpointIO::write, directory 'checkpoint_splitter.cpa/2' already exists, overwriting contents.../../pbauman/src/mesh/checkpoint_io.C, line 123, compiled Apr  5 2019 at 13:44:13 ***
Assertion `unique_id == old_node->unique_id()' failed.
unique_id = 2
old_node->unique_id() = 20


[0] ../../pbauman/src/mesh/checkpoint_io.C, line 1003, compiled Apr  5 2019 at 13:44:13
E.Assertion `(this->comm()).verify(std::string("../../pbauman/src/mesh/mesh_base.C").size())' failed.

Assertion `(mesh.comm()).verify(std::string("../../pbauman/src/mesh/mesh_communication.C").size())' failed.

[0] ../../pbauman/src/mesh/mesh_base.C, line 160, compiled Apr  5 2019 at 14:07:44
E.[1] ../../pbauman/src/mesh/mesh_communication.C, line 1101, compiled Apr  5 2019 at 14:07:47
........Warning:  This MeshOutput subclass only supports meshes which have been serialized!
Warning:  This MeshOutput subclass only supports meshes which have been serialized!
.....................Warning!  Parallel xda/xdr is not yet implemented.
Writing a serialized file instead.
........................................................................................................................................................................................................ERROR: The library has been built without
Space Filling Curve support.  Using a linear
partitioner instead!
.....................................................................................................................................................................................ERROR:  Unsupported PETSC Solver: ERROR:  Unsupported PETSC Solver: JACOBI
Continuing with PETSC defaults
JACOBI
Continuing with PETSC defaults
...........


!!!FAILURES!!!
Test Results:
Run:  1173   Failures: 0   Errors: 7


1) test: BoundaryRefinedMeshTest::testMesh (E) 
setUp() failed
- uncaught exception of type libMesh::LogicError
- 


2) test: BoundaryMeshTest::testMesh (E) 
setUp() failed
- uncaught exception of type libMesh::LogicError
- 


3) test: BoundaryInfoTest::testMesh (E) 
uncaught exception of type libMesh::LogicError
- 


4) test: BoundaryInfoTest::testShellFaceConstraints (E) 
uncaught exception of type libMesh::LogicError
- 


5) test: BoundaryInfoTest::testEdgeBoundaryConditions (E) 
uncaught exception of type libMesh::LogicError
- 


6) test: CheckpointIOTest::testAsciiDistRepSplitter (E) 
uncaught exception of type libMesh::LogicError
- 


7) test: CheckpointIOTest::testBinaryDistRepSplitter (E) 
uncaught exception of type libMesh::LogicError
- 

@pbauman
Copy link
Member Author

pbauman commented Apr 5, 2019

I'm trying with master now to see if it's there as well.

@roystgnr
Copy link
Member

roystgnr commented Apr 5, 2019

Yeah, I'm pretty sure it'll be there. I think this is what I'm seeing in #1938 too. At least one of the bugs appears to be an effect of making the changes in #2061 without changing some of the internal skip_repartitioning() uses to skip_noncritical_repartitioning()

@roystgnr
Copy link
Member

roystgnr commented Apr 5, 2019

I've got at least one case fixed; I'll put in a new PR with that and see what else we hit.

@pbauman
Copy link
Member Author

pbauman commented Apr 5, 2019

OK, just confirmed I'm seeing this in the current master.

Add an API hook for manually clearing the send_list and use this
API instead of internally manually calling _send_list.clear().
This is an API hook to reinitialize the send list. This may be
necessary for example when a user can only add an algebraic or
coupling GhostingFunctor after the DofMap has been initialized.
Such scenarios arise then the GhostingFunctor requires the evaluation
of solution fields in order to determine ghosting/coupling.
This is going to evolve into a full blown unit test of a specific
type of GhostingFunctor. For now, just adding the coupling functor
that will be the basis of the test.
This includes things like building up the simple mesh
for testing and initialing the test system. Subclasses
that actually do the testing will leverage all this.

fixup common infrastructure/build mesh
This is sanity checking the OverlappingCouplingFunctor assumptions
about the mesh and that we're doing what we think we're doing.
We'll use this special partitioner to ensure that the elements
in subdomain one are on a different processor than the elements
of subdomain two so that we necessarily force ghosting when
doing algebraic or coupling ghosting between the two subdomains.
This test exercises algebraic coupling by trying to build a FEMContext
for the part that should have been communicated. If the algebraic
ghosting didn't work correctly, this test will error because it will
not be able to evaluate the ghosted solution dofs.
This is the test for the correct sparsity pattern. The test for
correctness at this point is seeing if PETSc throws an extra
malloc error or not. Would be nice to make this a richer test.
Added some documentation on what additional steps would be needed
if the user can't attach an algebraic or coupling GhostingFunctor
to the DofMap until after the system has been initialized.
@pbauman
Copy link
Member Author

pbauman commented Apr 6, 2019

OK, rebased on master after #2097 was merged. Once everything passes, I'll go ahead and merge.

@pbauman pbauman merged commit 943d5d0 into libMesh:master Apr 7, 2019
@pbauman pbauman deleted the overlapping-coupling-test branch April 7, 2019 13:58
pbauman added a commit to pbauman/grins that referenced this pull request Apr 16, 2022
Note that this will require libMesh/libmesh#2088 to get the
DofMap::reinit_send_list() function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants