Profiling #97

hiptmair · 2019-01-17T06:06:41Z

It would be important to profile the example implementing a linear finite element solver for a full-featured elliptic boundary value problem (examples/ell_bvp_linfe) in order to identify performance bottlenecks in LehrFEM++. This example is currently in the lagr_fe_demo branch, but will be merged into master soon.

craffael · 2019-03-07T23:07:09Z

I've profiled ell_bvp_linfe as you've suggested on Windows and on my laptop. The problem is a bit that it is not so easy to share the result with you in an easy way. I've just extracted the function calls made from main() in the following excel file: https://www.dropbox.com/s/b4ifubddsf3i1el/Report20190307-2341_CallTreeSummary.xlsx?dl=0

As you can see about

29.89% of the time is spent in generating the mesh hierarchies
17.72% is spent for assembling the matrices
14.53% is spent for solving the linear systems
10% is spent to compute the error to the exact solution (H1 seminorm)
8.57% is spent to construct the FESpaceLagrangeO1, I think this is mostly about assigning dofs to entities.
3.85% is spent for computing the error to the exact solution (L2 norm)

craffael · 2019-03-07T23:53:19Z

Taking a look from the bottom up, i.e. looking at in which function most time is spent exclusively, i.e. excluding calls to child functions, we get the following: https://www.dropbox.com/s/qijwj9y3907725o/Report20190307-2341_FunctionSummary.xlsx?dl=0

Here we can see that

15% of the time is spent in RTDynamicCast, this is the implementation of dynamic_cast. I assume that most of this is the overhead introduced by ForwardIterator/RandomAccessIterator
10% is spent in RtlpLowFragHeapAllocFromContext which is heap allocation. Further analysis shows that about 3% out of the 10% percent of these allocations are overhead related to ForwardIterator/RandomAccessIterator
9.18% is spent in RtlFreeHeap which is used to free the heap.

hiptmair · 2019-03-08T07:28:27Z

Thanks a lot for these figures.

Of course, refinement is expensive, because it also accommodates local refinement. This is acceptable, because the overall complexity refining a single mesh is still O(N), N the number of cells of the mesh.
in the medium run the iterator issue should be resolved: "ranges based on pointer arrays". After the end of the term.
I am surprised how efficient the linear solver is!

hiptmair added the Profiling and Testing Runtime performance and testing label Jan 17, 2019

craffael mentioned this issue Jun 10, 2019

Lightweight Range #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling #97

Profiling #97

hiptmair commented Jan 17, 2019

craffael commented Mar 7, 2019

craffael commented Mar 7, 2019

hiptmair commented Mar 8, 2019

Profiling #97

Profiling #97

Comments

hiptmair commented Jan 17, 2019

craffael commented Mar 7, 2019

craffael commented Mar 7, 2019

hiptmair commented Mar 8, 2019