Firedrake: a High-level, Portable Finite Element Computation Framework

https://firedrakeproject.org

Florian Rathgeber¹, Lawrence Mitchell¹, David Ham^1,2, Michael Lange³, Andrew McRae², Fabio Luporini¹, Gheorghe-teodor Bercea¹, Paul Kelly¹

Slides: https://kynan.github.io/EuroSciPy2014

.footnote[¹ Department of Computing, Imperial College London ² Department of Mathematics, Imperial College London
³ Department of Earth Science & Engineering, Imperial College London]

???

Good morning! After this excellent keynote I hope I can keep your attention for another 25min before the recaffeination. My name is Florian and it is my pleasure to introduce Firedrake, a high-level, portable finite element computation framework implemented, of course, in Python. Firedrake is a fairly young project and has been developed by a group at Imperial over the past year. It builds on work I presented at SciPy in Austin last year and a number of packages in the SciPy ecosystem.

background-image:url(images/fem.svg)

???

solving PDEs using FEM, not going into details
focus on assembly: matrix + vector (see diagram)
evaluating problem-/PDE-specific integral for each element of the mesh
nice property: local and independent -> parallelisation

We're interested in solving partial differential equations, PDEs for short, using the finite element method, which is popular in science and engineering. I'm not going to assume everyone is familar with the method and not go into any mathematical detail. This talk is really about some of the advantages we get by solving this problem in Python and and how we can do that without sacrificing performance.

Therefore just a very high level overview of the FEM: we partition the domain where we want to solve the PDE into an unstructured mesh of cells and numerically evaluate a problem- or PDE-specific integral for each cell. When we've done that we're accumulating those contributions into a large sparse linear system of equations which we can solve with an iterative solver.

Let me just highlight a property which makes the FEM very well suited for solving computationally: these integral evaluations are local, independent and unform which means we can execute them in parallel.

Solving the Helmholtz equation in Python using Firedrake

$$\int_\Omega \nabla v \cdot \nabla u - \lambda v u ~dV = \int_\Omega v f ~dV$$

from firedrake import *

# Read a mesh and define a function space
mesh = Mesh('filename')
V = FunctionSpace(mesh, "Lagrange", 1)

# Define forcing function for right-hand side
f = Expression("- (lmbda + 2*(n**2)*pi**2) * sin(X[0]*pi*n) * sin(X[1]*pi*n)",
               lmbda=1, n=8)

# Set up the Finite-element weak forms
*u = TrialFunction(V)
*v = TestFunction(V)

*lmbda = 1
*a = (dot(grad(v), grad(u)) - lmbda * v * u) * dx
*L = v * f * dx

# Solve the resulting finite-element equation
p = Function(V)
solve(a == L, p)

Unified Form Language (UFL) from the FEniCS project to describe weak form of PDE

???

Now how do we use Firedrake to solve such a PDE?

As a mathematician you write down the equation in what is called the weak form. Firedrake uses the Unified Form Language from the FEniCS project to express this weak form in Python as almost literal transcription (the highlighted bit).

The entire code the user needs to write to solve this PDE is what you see on the slide:

read geometry (mesh) from file
define the PDE
solve it

To anyone familiar with the FEniCS project this will be familiar: Firedrake intentionally implements the same API.

The solve is of course where the magic happens and we'll get to that.

SciPy: high-level abstractions for scientific computing

???

So what is Firedrake and how does it fit in the SciPy ecosystem?

SciPy is, at least to some extent, about building high-level abstractions for scientific computing, which are computationally efficient but at the same time easy to use (think NumPy for array operations).

--

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM).

.source[— firedrakeproject.org]

???

Firedrake is one such abstraction for the portable solution of partial differential equations using the finite element method.

--

Two-layer abstraction for FEM computations from high-level descriptions:

Firedrake: a portable finite-element computation framework
Drive FE computations from a high-level problem specification
PyOP2: a high-level interface to unstructured mesh based methods
Efficiently execute kernels over an unstructured mesh in parallel

???

In fact it is actually two abstractions layered on top of each other, with Firedrake as a portable finite-element computation framework on top for solving problems at a very high level. The lower layer that does all the computational heavy lifting is PyOP2, a framework for unstructured mesh computations, whose role is to efficiently execute kernels over an unstructured mesh in parallel. That works because FEM assembly can be expressed as a local independent kernel as I mentioned earlier.

The Firedrake/PyOP2 tool chain

???

This diagram shows an overview of the Firedrake / PyOP2 toolchain:

Decoupling of Firedrake (FEM) and PyOP2 (parallelisation) layers
UFL to describe finite-element discretisation
FE weak forms compiled into kernels
FE assembly as PyOP2 parallel loops
PETSc for linear/nonlinear solvers and mesh building / distribution
Platform-specific runtime code generation and JIT compilation
Revisit this diagram later

Two-layered abstraction: Separation of concerns

???

Separating the problem into two abstraction layers gives us a real separation of concerns: contributors can work only in their area of expertise without having to be concerned, or even familiar, with the other layer.

This is not just a theory, this is how we actually work in our research group! We have mathematicians in the group how are deeply familiar with the FEM and work on e.g. adding support for further discretisations but have little expertise with computer architecture or parallel programming

and don't need to! On the other hand we have compiler experts who reason about optimising kernels as abstract loop nests without needing to know or care where those kernels come from.

Parallel computations on unstructured meshes with PyOP2

???

Let's start at the bottom layer and look at parallel computations on unstructured meshes with PyOP2.

Scientific computations on unstructured meshes

Independent local operations for each element of the mesh described by a kernel.
Reductions aggregate contributions from local operations to produce final result.

PyOP2

Domain-specific language embedded in Python for data parallel computations
Efficiently executes kernels in parallel over unstructured meshes or graphs
Portable programmes for different architectures without code change
Efficiency through runtime code generation and just-in-time (JIT) compilation

Unstructured mesh

???

FEM is an example of a class of scientific computations on unstructured meshes characterised by independent local operations that need to be performed for every entity of the mesh and can be described by a computational kernel. Some operations are followed by a reduction which aggregates contributions from these local operations. In FEM that is the assembly of the sparse linear system.

PyOP2 is a domain-specific language embedded in Python which abstracts this concept and allows writing portable programmes which efficiently run on different platforms without any changes to the source. How do we do this using Python, which you might think is slow?

Portability and efficiency is achieved through generating platform- and problem-specific code at runtime, just-in-time compiling it and executing it natively for the chosen architecture.

Why at runtime? No need for analysis / parsing code (the hard part), only sythesis / generating code (the easy part). All the information is available, just execute the code, reason about and introspect the (Python) objects!

PyOP2 Data Model

Mesh topology

Sets – Mesh entities and data DOFs
Maps – Define connectivity between entities in different Sets

Data

Dats – Defined on Sets (hold data, completely abstracted vector)
Globals – not associated to a Set (reduction variables, parameters)
Consts – Global, read-only data ]

Kernels / parallel loops

Executed in parallel on a Set through a parallel loop
Read / write / increment data accessed via maps

Linear algebra

Sparsity patterns defined by Maps
Mat – Matrix data on sparsities
Kernels compute local matrix – PyOP2 handles global assembly ]

???

PyOP2 uses a number of simple primitives to describe unstructured meshes and data defined on them:

Set: abstractly defines class of entities, only know how big it is
Map: defines connectivity between elements of sets (mesh topology)
- conceptually a "data parallel pointer"
- lookup table: which entities of the target Set associated with an entitity of the source Set
Dat: abstracted array, defined on a Set, contains actual values
- Data can live in CPU or GPU memory, partitioned for distributed parallel
- PyOP2 manages storage, layout, access, transfer, halo exchange
- "data parallel data"
- mostly opaque
- use like a vector: linear algebra operations
Kernels / parallel loops:
- kernels: define operations/computations to be performed independently for every entity of a given Set.
- executed over iteration Set (or subset) in parallel (parallel loop)
- sequential semantics, local view of the data, defined by access descriptors
- indirect data access via Maps (one level of indirection): abstracted from kernel
Linear algebra:
- Parallel loops can assemble matrices from local assembly kernel
- Matrix defined on sparsity, which are defined by pairs of Maps
- PyOP2 takes care of global assembly (reduction)
take home
- PyOP2 objects are bare/simple objects with powerful semantics
- tools to express higher-level objects/constructs e.g. FEM

PyOP2 Architecture

???

Now that we've introduced the concepts, how does it work?

PyOP2 architecture shown in this diagram:

Provides unified API to the user (which may be another program, e.g. Firedrake)
Declare data types, exectute parallel loops ("smart" dynamic construct)
PyOP2 runtime schedules computation efficiently (colouring avoids data races)
kernels executed over the mesh in native code
code generated and JIT compiled at runtime for different backends
4 backends, backend-specific JIT compilation tool chain
CPU JIT: shell out to compiler to compile generated kernel + marshalling code
use ctypes to load the compiled shared object
3rd party libraries (CPU: PETSc, GPU: Cusp) for sparse matrices, linear solvers

PyOP2 Device Data Management

PyOP2 Kernels & Parallel Loops

Kernels:

"local view" of the data
sequential semantics

Parallel loop:

use access descriptors to generate marshalling code
pass "right data" to kernel for each iteration set element ] .right-column[

Kernel for computing the midpoint of a triangle

void midpoint(double p[2], double *coords[2]) {
  p[0] = (coords[0][0] + coords[1][0] + coords[2][0]) / 3.0;
  p[1] = (coords[0][1] + coords[1][1] + coords[2][1]) / 3.0;
}

PyOP2 programme for computing midpoints over the mesh

from pyop2 import op2
op2.init()

vertices = op2.Set(num_vertices)
cells = op2.Set(num_cells)

cell2vertex = op2.Map(cells, vertices, 3, [...])

coordinates = op2.Dat(vertices ** 2, [...], dtype=float)
midpoints = op2.Dat(cells ** 2, dtype=float)

midpoint = op2.Kernel(kernel_code, "midpoint")

*op2.par_loop(midpoint, cells,
*             midpoints(op2.WRITE),
*             coordinates(op2.READ, cell2vertex))

Kernels as abstract syntax tree (AST), C string or Python function (not currently compiled!) ]

???

Simple example programme: compute midpoint of all cells in mesh

kernel: typically generated (by FFC in case of Firedrake or e.g. SymPy), specified as AST or C string ("back door")
can perform backend specific optimisations on AST
kernel: local view of the data, sequential semantics
parallel loop arguments: kernel, iteration set + access descriptors
access descriptors match kernel parameters, define local view for the kernel

Generated sequential code calling the midpoint kernel

// Kernel provided by the user
static inline void midpoint(double p[2], double *coords[2]) {
  p[0] = (coords[0][0] + coords[1][0] + coords[2][0]) / 3.0;
  p[1] = (coords[0][1] + coords[1][1] + coords[2][1]) / 3.0;
}

// Generated marshaling code executing the sequential loop
void wrap_midpoint(int start, int end,
                   double *arg0_0, double *arg1_0, int *arg1_0_map0_0) {
  double *arg1_0_vec[3];
  for ( int n = start; n < end; n++ ) {
    int i = n;
    arg1_0_vec[0] = arg1_0 + (arg1_0_map0_0[i * 3 + 0])* 2;
    arg1_0_vec[1] = arg1_0 + (arg1_0_map0_0[i * 3 + 1])* 2;
    arg1_0_vec[2] = arg1_0 + (arg1_0_map0_0[i * 3 + 2])* 2;
*    midpoint(arg0_0 + i * 2, arg1_0_vec);  // call user kernel (inline)
  }
}

Generated OpenMP code calling the midpoint kernel

// Kernel provided by the user
static inline void midpoint(double p[2], double *coords[2]) {
  p[0] = (coords[0][0] + coords[1][0] + coords[2][0]) / 3.0;
  p[1] = (coords[0][1] + coords[1][1] + coords[2][1]) / 3.0;
}

// Generated marshaling code executing the parallel loop
void wrap_midpoint(int boffset, int nblocks,
                   int *blkmap, int *offset, int *nelems,
                   double *arg0_0, double *arg1_0, int *arg1_0_map0_0) {
  #pragma omp parallel shared(boffset, nblocks, nelems, blkmap) {
    int tid = omp_get_thread_num();
    double *arg1_0_vec[3];
    #pragma omp for schedule(static)
    for ( int __b = boffset; __b < boffset + nblocks; __b++ ) {
      int bid = blkmap[__b];
      int nelem = nelems[bid];
      int efirst = offset[bid];
      for (int n = efirst; n < efirst+ nelem; n++ ) {
        int i = n;
        arg1_0_vec[0] = arg1_0 + (arg1_0_map0_0[i * 3 + 0])* 2;
        arg1_0_vec[1] = arg1_0 + (arg1_0_map0_0[i * 3 + 1])* 2;
        arg1_0_vec[2] = arg1_0 + (arg1_0_map0_0[i * 3 + 2])* 2;
*        midpoint(arg0_0 + i * 2, arg1_0_vec);  // call user kernel (inline)
      }
    }
  }
}

PyOP2 Backend Selection

The Firedrake/PyOP2 tool chain

Finite-element computations with Firedrake

???

How can Firedrake make use of that for FE computations?

Firedrake concepts

]

Function

Field defined on a set of degrees of freedom (DoFs), data stored as PyOP2 Dat

FunctionSpace

Characterized by a family and degree of FE basis functions, defines DOFs for function and relationship to mesh entities

Mesh

Defines abstract topology by sets of entities and maps between them (PyOP2 data structures) ]

???

Firedrake uses same concepts of Functions defined on FunctionSpaces defined on a Mesh as DOLFIN, however all of these are implemented in terms of the PyOP2 data structures I showed earlier:

Function's data stored as PyOP2 Dat (Firedrake does not directly touch field data!)
Dat defined on the DOFs DataSet of the FunctionSpace (defined on the nodes Set)
nodes Set related to Sets of cells, exterior/interior facets of the Mesh via Maps

Driving Finite-element Computations in Firedrake

Solving the Helmholtz equation in Python using Firedrake: $$\int_\Omega \nabla v \cdot \nabla u - \lambda v u ~dV = \int_\Omega v f ~dV$$

from firedrake import *

# Read a mesh and define a function space
mesh = Mesh('filename')
V = FunctionSpace(mesh, "Lagrange", 1)

# Define forcing function for right-hand side
f = Expression("- (lmbda + 2*(n**2)*pi**2) * sin(X[0]*pi*n) * sin(X[1]*pi*n)",
               lmbda=1, n=8)

# Set up the Finite-element weak forms
u = TrialFunction(V)
v = TestFunction(V)

lmbda = 1
a = (dot(grad(v), grad(u)) - lmbda * v * u) * dx
L = v * f * dx

# Solve the resulting finite-element equation
p = Function(V)
*solve(a == L, p)

???

Now that we've discussed the concepts, let's look at how to drive FE computations in Firedrake

This slide shows the code for solving the Helmholtz equation and should be familiar to any DOLFIN user modulo the import from Firedrake.

Behind the scenes of the solve call

Firedrake always solves nonlinear problems in resdiual form F(u;v) = 0
Transform linear problem into residual form:
```
J = a
F = ufl.action(J, u) - L
```
- Jacobian known to be a
- Always solved in a single Newton (nonlinear) iteration
Use Newton-like methods from PETSc SNES (optimised C library)
PETSc SNES requires two callbacks to evaluate residual and Jacobian:
- implemented as Python functiones (supported by petsc4py)
- evaluate residual by assembling residual form
```
assemble(F, tensor=F_tensor)
```
- evaluate Jacobian by assembling Jacobian form
```
assemble(J, tensor=J_tensor, bcs=bcs)
```
- assemble invokes PyOP2 with kernels generated from F and J

???

What happens when the final solve is called on the previous slide?
Unified solving interface even behind the scenes
If Jacobian not provided by the user, Firedrake uses automatic differentiation:
```
J = ufl.derivative(F, u)
```
Kernels generated by FFC and executed as PyOP2 parallel loops
- Firedrake builds PyOP2 parallel loop call, using FFC-generated kernel
- iterate over cells (for cell integrals) or facets (interior/exterior facet integrals)
- output tensor depends on rank of the form (PyOP2 Mat, Dat or Global)
- input arguments: coordinate field and any coefficients present in the form

How do Firedrake/PyOP2 achieve good performance?

No computationally expensive computations (inner loops) in pure Python
Call optimised libraries where possible (PETSc)
Expensive library code implemented in Cython (sparsity builder)
Kernel application over the mesh in natively generated code
Python is not just glue!

Caching

Firedrake
- Assembled operators
- Function spaces cached on meshes
- FFC-generated kernel code
PyOP2
- Maps cached on Sets
- Sparsity patterns
- JIT-compiled code
Only data isn't cached (Function/Dat)

Benchmarks

Explicit Wave Equation

from firedrake import *
mesh = Mesh("wave_tank.msh")

V = FunctionSpace(mesh, 'Lagrange', 1)
p = Function(V, name="p")
phi = Function(V, name="phi")

u = TrialFunction(V)
v = TestFunction(V)

p_in = Constant(0.0)
# Boundary condition for y=0
bc = DirichletBC(V, p_in, 1)

T = 10.
dt = 0.001
t = 0

while t <= T:
    p_in.assign(sin(2*pi*5*t))
    phi -= dt / 2 * p
    p += (assemble(dt * inner(grad(v), grad(phi))*dx) /
        assemble(v*dx))
    bc.apply(p)
    phi -= dt / 2 * p
    t += dt

] .right30[ 2nd order PDE: $$\frac{\partial^2\phi}{\partial t^2} - \nabla^2 \phi = 0$$

Formulation with 1st order time derivatives: $$\frac{\partial\phi}{\partial t} = - p$$ $$\frac{\partial p}{\partial t} + \nabla^2 \phi = 0$$ ]

Explicit Wave Equation Strong Scaling on UK National Supercomputer ARCHER

Cahn-Hilliard Equation Strong Scaling on UK National Supercomputer ARCHER

Resources

PyOP2 https://github.com/OP2/PyOP2
- PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes Florian Rathgeber, Graham R. Markall, Lawrence Mitchell, Nicholas Loriant, David A. Ham, Carlo Bertolli, Paul H.J. Kelly, WOLFHPC 2012
- Performance-Portable Finite Element Assembly Using PyOP2 and FEniCS Graham R. Markall, Florian Rathgeber, Lawrence Mitchell, Nicolas Loriant, Carlo Bertolli, David A. Ham, Paul H. J. Kelly , ISC 2013
Firedrake https://github.com/firedrakeproject/firedrake
- Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly Fabio Luporini, Ana Lucia Varbanescu, Florian Rathgeber, Gheorghe-Teodor Bercea, J. Ramanujam, David A. Ham, Paul H. J. Kelly, ACM Transactions on Architecture and Code Optimization, January 2015
UFL https://github.com/firedrakeproject/ufl
FFC https://github.com/firedrakeproject/ffc

This talk is available at https://kynan.github.io/EuroSciPy2014 (source)

Contact: Florian Rathgeber, @frathgeber

Slides created with remark

Files

slides.md

Latest commit

History

slides.md

File metadata and controls

Firedrake: a High-level, Portable Finite Element Computation Framework

https://firedrakeproject.org

Florian Rathgeber1, Lawrence Mitchell1, David Ham1,2, Michael Lange3, Andrew McRae2, Fabio Luporini1, Gheorghe-teodor Bercea1, Paul Kelly1

Solving the Helmholtz equation in Python using Firedrake

SciPy: high-level abstractions for scientific computing

The Firedrake/PyOP2 tool chain

Two-layered abstraction: Separation of concerns

Parallel computations on unstructured meshes with PyOP2

Scientific computations on unstructured meshes

PyOP2

Unstructured mesh

PyOP2 Data Model

Mesh topology

Data

Kernels / parallel loops

Linear algebra

PyOP2 Architecture

PyOP2 Device Data Management

PyOP2 Kernels & Parallel Loops

Kernel for computing the midpoint of a triangle

PyOP2 programme for computing midpoints over the mesh

Generated sequential code calling the midpoint kernel

Generated OpenMP code calling the midpoint kernel

PyOP2 Backend Selection

The Firedrake/PyOP2 tool chain

Finite-element computations with Firedrake

Firedrake concepts

Function

FunctionSpace

Mesh

Driving Finite-element Computations in Firedrake

Behind the scenes of the solve call

How do Firedrake/PyOP2 achieve good performance?

Caching

Benchmarks

Explicit Wave Equation

Explicit Wave Equation Strong Scaling on UK National Supercomputer ARCHER

Cahn-Hilliard Equation Strong Scaling on UK National Supercomputer ARCHER

Resources

Florian Rathgeber¹, Lawrence Mitchell¹, David Ham^1,2, Michael Lange³, Andrew McRae², Fabio Luporini¹, Gheorghe-teodor Bercea¹, Paul Kelly¹