Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling #97

Open
hiptmair opened this issue Jan 17, 2019 · 3 comments
Open

Profiling #97

hiptmair opened this issue Jan 17, 2019 · 3 comments
Labels
Profiling and Testing Runtime performance and testing

Comments

@hiptmair
Copy link
Collaborator

It would be important to profile the example implementing a linear finite element solver for a full-featured elliptic boundary value problem (examples/ell_bvp_linfe) in order to identify performance bottlenecks in LehrFEM++. This example is currently in the lagr_fe_demo branch, but will be merged into master soon.

@hiptmair hiptmair added the Profiling and Testing Runtime performance and testing label Jan 17, 2019
@craffael
Copy link
Owner

craffael commented Mar 7, 2019

I've profiled ell_bvp_linfe as you've suggested on Windows and on my laptop. The problem is a bit that it is not so easy to share the result with you in an easy way. I've just extracted the function calls made from main() in the following excel file: https://www.dropbox.com/s/b4ifubddsf3i1el/Report20190307-2341_CallTreeSummary.xlsx?dl=0

As you can see about

  • 29.89% of the time is spent in generating the mesh hierarchies
  • 17.72% is spent for assembling the matrices
  • 14.53% is spent for solving the linear systems
  • 10% is spent to compute the error to the exact solution (H1 seminorm)
  • 8.57% is spent to construct the FESpaceLagrangeO1, I think this is mostly about assigning dofs to entities.
  • 3.85% is spent for computing the error to the exact solution (L2 norm)

@craffael
Copy link
Owner

craffael commented Mar 7, 2019

Taking a look from the bottom up, i.e. looking at in which function most time is spent exclusively, i.e. excluding calls to child functions, we get the following: https://www.dropbox.com/s/qijwj9y3907725o/Report20190307-2341_FunctionSummary.xlsx?dl=0

Here we can see that

  • 15% of the time is spent in RTDynamicCast, this is the implementation of dynamic_cast. I assume that most of this is the overhead introduced by ForwardIterator/RandomAccessIterator
  • 10% is spent in RtlpLowFragHeapAllocFromContext which is heap allocation. Further analysis shows that about 3% out of the 10% percent of these allocations are overhead related to ForwardIterator/RandomAccessIterator
  • 9.18% is spent in RtlFreeHeap which is used to free the heap.

@hiptmair
Copy link
Collaborator Author

hiptmair commented Mar 8, 2019

Thanks a lot for these figures.

  • Of course, refinement is expensive, because it also accommodates local refinement. This is acceptable, because the overall complexity refining a single mesh is still O(N), N the number of cells of the mesh.
  • in the medium run the iterator issue should be resolved: "ranges based on pointer arrays". After the end of the term.
  • I am surprised how efficient the linear solver is!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Profiling and Testing Runtime performance and testing
Projects
None yet
Development

No branches or pull requests

2 participants