Over the last few years, the clock speed per processor core has stagnated.
This stagnation has lead to the design of larger core counts in high performance computing machines.
As a result of these developments, increased concurrency in numerical algorithms must be developed in order to take advantage of this architecture style.
Perhaps the largest bottleneck in concurrency for time-dependent simulations is traditional time integration methods.
Traditional time integration methods solve for the time domain sequentially, and as the spatial grid is refined a proportionally larger number of time steps must be taken to maintain accuracy and stability constraints.
While solving the time domain sequentially with a traditional time integration method is an optimal algorithm of order
The goal of this project is to make use of the XBraid library from Lawrence Livermore National Laboratory to solve the time domain in parallel using multigrid-reduction-in-time techniques. The XBraid library is implemented in C and aims to be a non-intrusive method to implement parallel time marching methods into existing codes.
In order to use the XBraid library, several data structures and functions must be implemented and provided to the XBraid solver struct. The two required data structures are the app and vector structures. In general, the app struct contains the time independent data and the vector struct contains the time dependent data. For this initial example, the time independent data includes the mesh which is fixed for all time steps, and the time dependent data is the solution state vector. The functions tell XBraid how to perform operations on the data type used by your solver, in this case deal.ii uses the Vector data type. These operations include how to initialize the data at a given time, how to sum the data, and how to pack and unpack linear buffers for transmission to other processors via MPI. The XBraid documentation should be read for a full list of functions that must be implemented and the details of what the function should do. The typical format is the function is called with arguments of the app struct, one or more vector structs, and a status struct that contains information on the current status of the XBraid simulation (the current multigrid iteration, the level the function is being called from, the time and timestep number, etc.).
Perhaps the most important function is the step function. This function tells XBraid how to advance the solution forward in time from the initial to the final times given in the status struct. This method uses a traditional time integration method such as the fourth order explicit Runge Kutta method.
The solver used in this example is based off the heat equation solver from the Step-26 Tutorial. The HeatEquation class becomes member data to XBraid's app struct, and XBraid's vector struct becomes a wrapper for deal.ii's Vector data type. The HeatEquation class cannot simply be used as is though as it contains both time dependent and time independent member data. In order to simplify the problem the adaptive mesh refinement is removed. Theoretically XBraid is capable of working with adaptive mesh refinement and in fact contains support for time refinement (which is also not used for simplicity). All adaptive mesh refinement functionality is removed from the sovler. The time-dependent solution state vectors are also removed from the HeatEquation member data. That time-dependent data will be provided at each timestep by XBraid via the vector struct.
In the default mode, this code solves the heat equation with,
$$\frac{\partial u}{\partial t} - \Delta u = f(\boldsymbol{x},t), \qquad &\forall\boldsymbol{x}\in\Omega,t\in\left( 0,T \right),$$
with initial conditions,
$$u(\boldsymbol{x},0) = u_0(\boldsymbol{x}), \qquad &\forall \boldsymbol{x}\in\Omega,$$
where $u_0(\boldsymbol{x})=0$ , and boundary conditions,
$$u(\boldsymbol{x},t) = g(\boldsymbol{x},t), \qquad &\forall \boldsymbol{x}\in\partial\Omega,t\in\left( 0,T \right)$$
where,
$$g(\boldsymbol{x},t) = 0,$$
and forcing function,
$$
f(\mathbf x, t)
\left{
\begin{array}{ll}
\chi_1(\mathbf x) & \text{if (x>0.5) and (y>-0.5)}
\
\chi_2(\mathbf x)
& \text{if (x>-0.5) and (y>0.5)}
\
0 & \text{otherwise}
\end{array}
\right.
$$
with,
$$
\chi_1(\mathbf x) = \exp{-0.5\frac{(time-0.125)^2}{0.005}}
$$
and,
$$
\chi_2(\mathbf x) = \exp{-0.5\frac{(time-0.375)^2}{0.005}}
$$
The forcing function is a Gaussian pulse in time that is centered around 0.125 time units for
The method of manufactured solutions is used to test the correctness of the implementation.
In the method of manufactured solutions, we create a solution
The manufactured solution is run on progressively more refined grids and the solution generated by the finite element method is compared to the exact solution
\begin{table}\label{tbl:errorMFG}
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|} \hline
cycle & # cells & # dofs &
\begin{table}\label{tbl:convergenceRate}
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|} \hline
cycle & # cells & # dofs & Slope
The entry point of the code is in pitdealii.cc and sets up XBraid for a simulation. The XBraid setup involves initializing the app struct and configuring XBraid for the desired number of timesteps, number of iterations, etc. The functions implemented for XBraid's use are declared in BraidFuncs.hh and defined in BraidFuncs.cc. The HeatEquation class and all deal.ii functionality is declared in HeatEquation.hh and defiend in HeatEquationImplem.hh. Since HeatEquation is a class template, its definition file HeatEquationImplem.hh is included at the bottom of HeatEquation.hh. Lastly various helper functions and variables such as the current processor id and the output stream are declared in Utilities.hh and defined in Utilities.cc.
This directory contains tests to be built and run with CMake. These tests verify the correct implementation of the various functions.
This directory contains some further documentation on the code. There is a doxygen config file in the doxygen/doc/ folder, you can generate doxygen documentation with doxygen pitdealii.doxygen. In the design folder there is a latex file that has some further documentation on the algorithm implemented in this software.
To compile, you need deal.ii and XBraid to be installed with development headers somwehere on your system. Some implementation of MPI such as OpenMPI with development headers must also be installed. The source code for deal.ii is available at https://dealii.org/ and the source code for XBraid is available at https://computation.llnl.gov/projects/parallel-time-integration-multigrid. See the documentation of each package for compilation and installation instructions.
Depending on where they are installed, pitdealii may need help finding these libraries. To find deal.ii, pitdealii first looks in typical deal.ii install directories followed by one directory up(../), two directories up (../../), and lastly in the environment variable DEAL_II_DIR. In contrast, XBraid currently does not have any default locations to look for and so the environment variable BRAID_DIR must be specified. For MPI, pitdealii looks in standard installation folders only, for that reason I recommend you install MPI with your package manager.
A compile process of pitdealii may look like,
mkdir build
cd build
BRAID_DIR=/path/to/braid/ cmake ../
make
There is currently no option to install pitdealii anywhere. The binaries are generated in the bin folder, and tests are placed into the test folder. Options that can be passed to CMake for pitdealii include:
- CMAKE_BUILD_TYPE=Debug/Release
- DO_MFG=ON/OFF
- USE_MPI=ON/OFF
The build type specifies whether to compile with debugging symbols, assertions, and optimizations or not.
The option for manufactured solutions (DO_MFG) switches from solving the "standard" heat equation to solving a known solution heat equation so that the correctness of the code can be tested.
Lastly the MPI option is only used to specify where to write output information when using the pout() function from the Utilities.hh file. If USE_MPI is set to ON, then every processor writes to its own file called pout.<#> where <#> is the processor number. If USE_MPI is set to OFF, then every processor writes to stdout.
Once pitdealii has been compiled, the program can be run by calling the binary generated in ./build/bin/. The test can be run by calling ctest from inside the build/ directory. Unless the output path has been changed in the source code (currently hardcoded), then the output files will be placed into the folder the command was called from.