Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2d, 5th order Nédélec (first kind) and Raviart-Thomas FEs #3884

Open
wants to merge 20 commits into
base: devel
Choose a base branch
from

Conversation

nmnobre
Copy link
Member

@nmnobre nmnobre commented Jun 20, 2024

Hi Roy,

The good:

ND1_convergence_lu

The bad:

46% slowdown in compute_shape_functions() and 40% in init_shape_functions() when running vector fe ex3 with 1st order Nédélec elements.
for i in {1..5}; do ./example-opt element_type=TRI6 -pc_type lu grid_size=150 | grep "| FE " -A2; done

BEFORE (196a0a4):

| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0584      0.000000    0.0584      0.000000    5.14     5.14     |
|   init_shape_functions()           181200     0.0567      0.000000    0.0567      0.000000    4.98     4.98     |
| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0599      0.000000    0.0599      0.000000    5.26     5.26     |
|   init_shape_functions()           181200     0.0570      0.000000    0.0570      0.000000    5.01     5.01     |
| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0591      0.000000    0.0591      0.000000    5.15     5.15     |
|   init_shape_functions()           181200     0.0567      0.000000    0.0567      0.000000    4.95     4.95     |
| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0611      0.000000    0.0611      0.000000    5.32     5.32     |
|   init_shape_functions()           181200     0.0566      0.000000    0.0566      0.000000    4.93     4.93     |
| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0598      0.000000    0.0598      0.000000    5.24     5.24     |
|   init_shape_functions()           181200     0.0567      0.000000    0.0567      0.000000    4.98     4.98     |

AFTER (c659c2b):

| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0865      0.000000    0.0865      0.000000    7.31     7.31     |
|   init_shape_functions()           181200     0.0809      0.000000    0.0809      0.000000    6.84     6.84     |
| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0861      0.000000    0.0861      0.000000    7.19     7.19     |
|   init_shape_functions()           181200     0.0795      0.000000    0.0795      0.000000    6.64     6.64     |
| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0867      0.000000    0.0867      0.000000    7.36     7.36     |
|   init_shape_functions()           181200     0.0798      0.000000    0.0798      0.000000    6.77     6.77     |
| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0862      0.000000    0.0862      0.000000    7.24     7.24     |
|   init_shape_functions()           181200     0.0794      0.000000    0.0794      0.000000    6.67     6.67     |
| FE                                                                                                              |
|   compute_shape_functions()        181200     0.0853      0.000000    0.0853      0.000000    7.18     7.18     |
|   init_shape_functions()           181200     0.0801      0.000000    0.0801      0.000000    6.75     6.75     |

The ugly:

No support for non-conforming interfaces and the shape functions are still explicitly hard-coded. On the bright side, all other code should support arbitrary order Nédélec elements. In practice, this means that, in the future, adding support for 3rd order shape functions would literally 'only' require writing those functions and their derivatives.

@moosebuild
Copy link

moosebuild commented Jun 20, 2024

Job Coverage on 9b88378 wanted to post the following:

Coverage

b76c44 #3884 9b8837
Total Total +/- New
Rate 62.84% 61.35% -1.49% 8.02%
Hits 69469 69517 +48 243
Misses 41082 43797 +2715 2786

Diff coverage report

Full coverage report

Warnings

  • New new line coverage rate 8.02% is less than the suggested 90.0%

This comment will be updated on new commits.

@roystgnr
Copy link
Member

Is this ready to review? I've been busy, and currently the recent force pushes make me cautious, but if I don't get to it in between vacation prep work tomorrow then I'm definitely not getting to it until I get back.

@nmnobre
Copy link
Member Author

nmnobre commented Jun 28, 2024

Is this ready to review? I've been busy, and currently the recent force pushes make me cautious, but if I don't get to it in between vacation prep work tomorrow then I'm definitely not getting to it until I get back.

There are a couple things I want to do:

  1. Try using Elem::node_id instead of Elem::point to recover some of the performance loss but, wickedly, this isn't working for quads (yet), even though it is for tris.
    EDIT: Ah, of course, it's caching in FE::reinit()! In that function we avoid reinitialising the shape functions even if shapes_need_reinit() is true by looking at the geometrical similarity between consecutive elements, and so we must also check orientation based on geometrical features. In this way, the same geometry equates to the same orientation. This is obviously not true if we compute orientation based on node ids, which only depend on how we happen to count nodes. So now the question is if caching is a worthwhile optimisation in the general case, i.e. everything which is not a structured quad/hex grid? Are these common in libMesh/MOOSE use cases?

  2. Add support for 2d, 2nd order RT at the cost of the ND1 definitions, which is working for tris for both 1st and 2nd orders, and for quads but (still) only for 1st order.
    EDIT: Turns out I was just not using the correct space for the scalar variable in ex6, it's looking good now:
    RT_convergence

I know you've been quite busy so I just kept on working here. If you review this, I'll simply open another PR when I get somewhere with these two items, otherwise I'm happy to continue adding here. I'm happy either way, enjoy your vacation! :)

Okay Roy, I'm happy with this now. We can always transfer the discussion around 1) to #3874. Ready for review when you are. Enjoy your vacation! :)

@nmnobre nmnobre changed the title 2d, 2nd order Nédélec FEs 2d, 2nd order Nédélec (first kind) and Raviart-Thomas FEs Jun 28, 2024
@roystgnr
Copy link
Member

This is obviously not true if we compute orientation based on node ids, which only depend on how we happen to count nodes.

And on how we happen to recount nodes. I'd love to have an option someday to compute based on Node::unique_id(), which we preserve through hell or high water, but "if we just coarsened away some elements, let's renumber every id() to get rid of the gaps" is a default libMesh setting.

So now the question is if caching is a worthwhile optimisation in the general case, i.e. everything which is not a structured quad/hex grid?

In the general case, no, and it might be worthwhile to add an option to FE to disable caching entirely upon user request.

In a more-general-than-structured-grids case, yes. There are a lot of FEM users in general who use immersed boundary methods or who use grid generation schemes based on an overset structured grid or grids. In the former case you don't have a structured grid topology but you have lots of identical geometry that benefits from caching; in the latter case you have a lot of weirdly shaped elements but they're around a core of identical quads or hexes. I don't know if anyone is currently doing these with libMesh, but they're ideas that seem to keep resurfacing, usually with good reasons.

Are [structured grids] common in libMesh/MOOSE use cases?

Yes, surprisingly; you can go quite far with geometries captured via phase field methods ... perhaps not on a structured grid but at least on an adaptively refined grid whose coarse level is structured. I did a lot of that in my PhD work (which is where the caching work came from), and I was just at a little conference this spring with a lot of MOOSE users doing that for subsurface modeling.

@nmnobre
Copy link
Member Author

nmnobre commented Jul 10, 2024

You make good points, so let's keep things as they are. :)

Actually... I wanted to convince myself that what I said here,

In practice, this means that, in the future, adding support for 3rd order shape functions would literally 'only' require writing those functions and their derivatives.

is in fact true, so I went ahead and implemented the 3rd order case... :P

ND1_convergence_lu
RT_convergence

As a matter of fact, I may or may not have written a little script that generates the (code for the) shape functions (and their derivatives) for arbitrary order, https://github.com/farscape-project/libMeshGen. It uses symfem and it's only 2d for now, but the objective is to generalise to 3d of course. So, long story short, we can add any order we want in a matter of minutes. :)

I think for our use cases 3rd order suffices, so I'd probably leave it there for now? Also, vector ex6 requires L2_LAGRANGE elements of an order below the RT order, meaning that we can't go to 4th order RT in that example anyway.

What's your take? What order is good enough? I can then add those commits to this PR and then we can finally review/merge. :)

@roystgnr
Copy link
Member

Oh that's beautiful.

I'm of the general opinion that we should add as many orders as floating point arithmetic lets us get away with; in FEInterface I set unlimited = 11 because that was the point where even in 1D I found double-precision error swamping discretization error in p-refined meshes. :-D

But that was a cap for elements where we have code that handles arbitrary p. What degree we want to hand-code is another question, and I'm not sure we have a settled answer. With fe_hierarchic_shape_1D.C we wrote hand-coded shapes up to p=7 and only go to default: for 8 and up; with fe_hermite_shape_1D.C the default: is 6 and up, and I'm sure the explanation wasn't "Hermite is relatively more efficient in the parameterized case" so much as "Roy is lazier than Ben and John".

If you can't get a parametric solution out of symfem (it does get tricky outside 1D...) then we don't need to go up far. H(curl) stuff is where I've seen hp-adaptivity people go wild with p, but that could be because EM problems are inherently smoother in most domains or because these were academic researchers who tend to solve problems on simpler domains. I don't think I've personally done anything serious past p=4, and sometimes I feel like half of our MOOSE users would use p=-1 if that was an option, so even p=3 is a great start.

Does vector ex6 really need L2_LAGRANGE? We're using MONOMIAL for p there now, and that goes up indefinitely. If we need a bi/tri linear space that goes up indefinitely then we've got L2_HIERARCHIC available too.

@nmnobre
Copy link
Member Author

nmnobre commented Jul 11, 2024

and sometimes I feel like half of our MOOSE users would use p=-1 if that was an option

😆

Does vector ex6 really need L2_LAGRANGE? We're using MONOMIAL for p there now

Probably not... MONOMIAL is enough for triangles because it spans the same space as L2_LAGRANGE, but that's not true for quads, thus why I switched to L2_LAGRANGE... Even then, I spent most part of yesterday's afternoon trying to figure out why my convergence plots weren't showing the right convergence rate... Turns out I was using QUAD8s, not QUAD9s, so I was silently falling into the serendipity case...

I didn't even think of L2_HIERARCHIC, I'll give that a try tomorrow. If it spans the same space as L2_LAGRANGE, I don't see why it won't work.

@nmnobre
Copy link
Member Author

nmnobre commented Jul 13, 2024

Good shout, L2_HIERARCHIC works just fine! I didn't try to understand why, but I can't use L2_HIERARCHIC for CONSTANT order for some reason... I mean it runs, but the errors are higher, and the error on the scalar field doesn't converge.
EDIT: it's by design, only SIDE_HIERARCHIC covers the CONSTANT case:

libmesh_assert_greater (o, 0);

I didn't add these higher orders to either run.sh nor to either .html, there's only so much we can add before it becomes too crowded, but let me know what you think.

Other than that, I'm struggling to converge the high-order, highly-refined configurations (the situation improved slightly with the latest commit), though I think I've got enough evidence the implementation is correct:

ND1_convergence_lu
RT_convergence

EDIT: I've rerun the 2nd order case on QUADs in single precision and marked both $\sqrt{\epsilon_\mathrm{single}}$ and $\sqrt{\epsilon_\mathrm{double}}$ with red lines. I'm pretty convinced the issues are due to FP precision limits, though I'm not quite sure I understand why we saturate at the sqrt, maybe related to the l2 norm, but not sure...
ND1_convergence_lu_prec

@nmnobre nmnobre changed the title 2d, 2nd order Nédélec (first kind) and Raviart-Thomas FEs 2d, 5th order Nédélec (first kind) and Raviart-Thomas FEs Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants