class: center, middle, inverse
Florian Rathgeber1, Lawrence Mitchell1, David Ham1,2, Michael Lange3, Andrew McRae2, Fabio Luporini1, Gheorghe-teodor Bercea1, Paul Kelly1
Slides: https://kynan.github.io/EuroSciPy2014
.footnote[1 Department of Computing, Imperial College London
2 Department of Mathematics, Imperial College London
3 Department of Earth Science & Engineering, Imperial College London]
???
Good morning! After this excellent keynote I hope I can keep your attention for another 25min before the recaffeination. My name is Florian and it is my pleasure to introduce Firedrake, a high-level, portable finite element computation framework implemented, of course, in Python. Firedrake is a fairly young project and has been developed by a group at Imperial over the past year. It builds on work I presented at SciPy in Austin last year and a number of packages in the SciPy ecosystem.
background-image:url(images/fem.svg)
???
- solving PDEs using FEM, not going into details
- focus on assembly: matrix + vector (see diagram)
- evaluating problem-/PDE-specific integral for each element of the mesh
- nice property: local and independent -> parallelisation
We're interested in solving partial differential equations, PDEs for short, using the finite element method, which is popular in science and engineering. I'm not going to assume everyone is familar with the method and not go into any mathematical detail. This talk is really about some of the advantages we get by solving this problem in Python and and how we can do that without sacrificing performance.
Therefore just a very high level overview of the FEM: we partition the domain where we want to solve the PDE into an unstructured mesh of cells and numerically evaluate a problem- or PDE-specific integral for each cell. When we've done that we're accumulating those contributions into a large sparse linear system of equations which we can solve with an iterative solver.
Let me just highlight a property which makes the FEM very well suited for solving computationally: these integral evaluations are local, independent and unform which means we can execute them in parallel.
$$\int_\Omega \nabla v \cdot \nabla u - \lambda v u ~dV = \int_\Omega v f ~dV$$
from firedrake import *
# Read a mesh and define a function space
mesh = Mesh('filename')
V = FunctionSpace(mesh, "Lagrange", 1)
# Define forcing function for right-hand side
f = Expression("- (lmbda + 2*(n**2)*pi**2) * sin(X[0]*pi*n) * sin(X[1]*pi*n)",
lmbda=1, n=8)
# Set up the Finite-element weak forms
*u = TrialFunction(V)
*v = TestFunction(V)
*lmbda = 1
*a = (dot(grad(v), grad(u)) - lmbda * v * u) * dx
*L = v * f * dx
# Solve the resulting finite-element equation
p = Function(V)
solve(a == L, p)
Unified Form Language (UFL) from the FEniCS project to describe weak form of PDE
???
Now how do we use Firedrake to solve such a PDE?
As a mathematician you write down the equation in what is called the weak form. Firedrake uses the Unified Form Language from the FEniCS project to express this weak form in Python as almost literal transcription (the highlighted bit).
The entire code the user needs to write to solve this PDE is what you see on the slide:
- read geometry (mesh) from file
- define the PDE
- solve it
To anyone familiar with the FEniCS project this will be familiar: Firedrake intentionally implements the same API.
The solve is of course where the magic happens and we'll get to that.
???
So what is Firedrake and how does it fit in the SciPy ecosystem?
SciPy is, at least to some extent, about building high-level abstractions for scientific computing, which are computationally efficient but at the same time easy to use (think NumPy for array operations).
--
Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM).
.source[— firedrakeproject.org]
???
Firedrake is one such abstraction for the portable solution of partial differential equations using the finite element method.
--
Two-layer abstraction for FEM computations from high-level descriptions:
- Firedrake: a portable finite-element computation framework
Drive FE computations from a high-level problem specification - PyOP2: a high-level interface to unstructured mesh based methods
Efficiently execute kernels over an unstructured mesh in parallel
???
In fact it is actually two abstractions layered on top of each other, with Firedrake as a portable finite-element computation framework on top for solving problems at a very high level. The lower layer that does all the computational heavy lifting is PyOP2, a framework for unstructured mesh computations, whose role is to efficiently execute kernels over an unstructured mesh in parallel. That works because FEM assembly can be expressed as a local independent kernel as I mentioned earlier.
???
This diagram shows an overview of the Firedrake / PyOP2 toolchain:
- Decoupling of Firedrake (FEM) and PyOP2 (parallelisation) layers
- UFL to describe finite-element discretisation
- FE weak forms compiled into kernels
- FE assembly as PyOP2 parallel loops
- PETSc for linear/nonlinear solvers and mesh building / distribution
- Platform-specific runtime code generation and JIT compilation
- Revisit this diagram later
???
Separating the problem into two abstraction layers gives us a real separation of concerns: contributors can work only in their area of expertise without having to be concerned, or even familiar, with the other layer.
This is not just a theory, this is how we actually work in our research group! We have mathematicians in the group how are deeply familiar with the FEM and work on e.g. adding support for further discretisations but have little expertise with computer architecture or parallel programming
- and don't need to! On the other hand we have compiler experts who reason about optimising kernels as abstract loop nests without needing to know or care where those kernels come from.
class: center, middle
???
Let's start at the bottom layer and look at parallel computations on unstructured meshes with PyOP2.
- Independent local operations for each element of the mesh described by a kernel.
- Reductions aggregate contributions from local operations to produce final result.
- Domain-specific language embedded in Python for data parallel computations
- Efficiently executes kernels in parallel over unstructured meshes or graphs
- Portable programmes for different architectures without code change
- Efficiency through runtime code generation and just-in-time (JIT) compilation
???
FEM is an example of a class of scientific computations on unstructured meshes characterised by independent local operations that need to be performed for every entity of the mesh and can be described by a computational kernel. Some operations are followed by a reduction which aggregates contributions from these local operations. In FEM that is the assembly of the sparse linear system.
PyOP2 is a domain-specific language embedded in Python which abstracts this concept and allows writing portable programmes which efficiently run on different platforms without any changes to the source. How do we do this using Python, which you might think is slow?
Portability and efficiency is achieved through generating platform- and problem-specific code at runtime, just-in-time compiling it and executing it natively for the chosen architecture.
Why at runtime? No need for analysis / parsing code (the hard part), only sythesis / generating code (the easy part). All the information is available, just execute the code, reason about and introspect the (Python) objects!
.pull-left[
Sets
– Mesh entities and data DOFsMaps
– Define connectivity between entities in differentSets
Dats
– Defined onSets
(hold data, completely abstracted vector)Globals
– not associated to aSet
(reduction variables, parameters)Consts
– Global, read-only data ]
.pull-right[
- Executed in parallel on a
Set
through a parallel loop - Read / write / increment data accessed via maps
Sparsity
patterns defined byMaps
Mat
– Matrix data on sparsities- Kernels compute local matrix – PyOP2 handles global assembly ]
???
PyOP2 uses a number of simple primitives to describe unstructured meshes and data defined on them:
- Set: abstractly defines class of entities, only know how big it is
- Map: defines connectivity between elements of sets (mesh topology)
- conceptually a "data parallel pointer"
- lookup table: which entities of the target Set associated with an entitity of the source Set
- Dat: abstracted array, defined on a Set, contains actual values
- Data can live in CPU or GPU memory, partitioned for distributed parallel
- PyOP2 manages storage, layout, access, transfer, halo exchange
- "data parallel data"
- mostly opaque
- use like a vector: linear algebra operations
- Kernels / parallel loops:
- kernels: define operations/computations to be performed independently for every entity of a given Set.
- executed over iteration Set (or subset) in parallel (parallel loop)
- sequential semantics, local view of the data, defined by access descriptors
- indirect data access via Maps (one level of indirection): abstracted from kernel
- Linear algebra:
- Parallel loops can assemble matrices from local assembly kernel
- Matrix defined on sparsity, which are defined by pairs of Maps
- PyOP2 takes care of global assembly (reduction)
- take home
- PyOP2 objects are bare/simple objects with powerful semantics
- tools to express higher-level objects/constructs e.g. FEM
???
Now that we've introduced the concepts, how does it work?
PyOP2 architecture shown in this diagram:
- Provides unified API to the user (which may be another program, e.g. Firedrake)
- Declare data types, exectute parallel loops ("smart" dynamic construct)
- PyOP2 runtime schedules computation efficiently (colouring avoids data races)
- kernels executed over the mesh in native code
- code generated and JIT compiled at runtime for different backends
- 4 backends, backend-specific JIT compilation tool chain
- CPU JIT: shell out to compiler to compile generated kernel + marshalling code
- use ctypes to load the compiled shared object
- 3rd party libraries (CPU: PETSc, GPU: Cusp) for sparse matrices, linear solvers
.left-column[
Kernels:
- "local view" of the data
- sequential semantics
Parallel loop:
- use access descriptors to generate marshalling code
- pass "right data" to kernel for each iteration set element ] .right-column[
void midpoint(double p[2], double *coords[2]) {
p[0] = (coords[0][0] + coords[1][0] + coords[2][0]) / 3.0;
p[1] = (coords[0][1] + coords[1][1] + coords[2][1]) / 3.0;
}
from pyop2 import op2
op2.init()
vertices = op2.Set(num_vertices)
cells = op2.Set(num_cells)
cell2vertex = op2.Map(cells, vertices, 3, [...])
coordinates = op2.Dat(vertices ** 2, [...], dtype=float)
midpoints = op2.Dat(cells ** 2, dtype=float)
midpoint = op2.Kernel(kernel_code, "midpoint")
*op2.par_loop(midpoint, cells,
* midpoints(op2.WRITE),
* coordinates(op2.READ, cell2vertex))
Kernels as abstract syntax tree (AST), C string or Python function (not currently compiled!) ]
???
Simple example programme: compute midpoint of all cells in mesh
- kernel: typically generated (by FFC in case of Firedrake or e.g. SymPy), specified as AST or C string ("back door")
- can perform backend specific optimisations on AST
- kernel: local view of the data, sequential semantics
- parallel loop arguments: kernel, iteration set + access descriptors
- access descriptors match kernel parameters, define local view for the kernel
// Kernel provided by the user
static inline void midpoint(double p[2], double *coords[2]) {
p[0] = (coords[0][0] + coords[1][0] + coords[2][0]) / 3.0;
p[1] = (coords[0][1] + coords[1][1] + coords[2][1]) / 3.0;
}
// Generated marshaling code executing the sequential loop
void wrap_midpoint(int start, int end,
double *arg0_0, double *arg1_0, int *arg1_0_map0_0) {
double *arg1_0_vec[3];
for ( int n = start; n < end; n++ ) {
int i = n;
arg1_0_vec[0] = arg1_0 + (arg1_0_map0_0[i * 3 + 0])* 2;
arg1_0_vec[1] = arg1_0 + (arg1_0_map0_0[i * 3 + 1])* 2;
arg1_0_vec[2] = arg1_0 + (arg1_0_map0_0[i * 3 + 2])* 2;
* midpoint(arg0_0 + i * 2, arg1_0_vec); // call user kernel (inline)
}
}
// Kernel provided by the user
static inline void midpoint(double p[2], double *coords[2]) {
p[0] = (coords[0][0] + coords[1][0] + coords[2][0]) / 3.0;
p[1] = (coords[0][1] + coords[1][1] + coords[2][1]) / 3.0;
}
// Generated marshaling code executing the parallel loop
void wrap_midpoint(int boffset, int nblocks,
int *blkmap, int *offset, int *nelems,
double *arg0_0, double *arg1_0, int *arg1_0_map0_0) {
#pragma omp parallel shared(boffset, nblocks, nelems, blkmap) {
int tid = omp_get_thread_num();
double *arg1_0_vec[3];
#pragma omp for schedule(static)
for ( int __b = boffset; __b < boffset + nblocks; __b++ ) {
int bid = blkmap[__b];
int nelem = nelems[bid];
int efirst = offset[bid];
for (int n = efirst; n < efirst+ nelem; n++ ) {
int i = n;
arg1_0_vec[0] = arg1_0 + (arg1_0_map0_0[i * 3 + 0])* 2;
arg1_0_vec[1] = arg1_0 + (arg1_0_map0_0[i * 3 + 1])* 2;
arg1_0_vec[2] = arg1_0 + (arg1_0_map0_0[i * 3 + 2])* 2;
* midpoint(arg0_0 + i * 2, arg1_0_vec); // call user kernel (inline)
}
}
}
}
class: center, middle
???
How can Firedrake make use of that for FE computations?
.left70[
.right30[
Field defined on a set of degrees of freedom (DoFs), data stored as PyOP2 Dat
Characterized by a family and degree of FE basis functions, defines DOFs for function and relationship to mesh entities
Defines abstract topology by sets of entities and maps between them (PyOP2 data structures) ]
???
Firedrake uses same concepts of Functions defined on FunctionSpaces defined on a Mesh as DOLFIN, however all of these are implemented in terms of the PyOP2 data structures I showed earlier:
- Function's data stored as PyOP2 Dat (Firedrake does not directly touch field data!)
- Dat defined on the DOFs DataSet of the FunctionSpace (defined on the nodes Set)
- nodes Set related to Sets of cells, exterior/interior facets of the Mesh via Maps
Solving the Helmholtz equation in Python using Firedrake:
$$\int_\Omega \nabla v \cdot \nabla u - \lambda v u ~dV = \int_\Omega v f ~dV$$
from firedrake import *
# Read a mesh and define a function space
mesh = Mesh('filename')
V = FunctionSpace(mesh, "Lagrange", 1)
# Define forcing function for right-hand side
f = Expression("- (lmbda + 2*(n**2)*pi**2) * sin(X[0]*pi*n) * sin(X[1]*pi*n)",
lmbda=1, n=8)
# Set up the Finite-element weak forms
u = TrialFunction(V)
v = TestFunction(V)
lmbda = 1
a = (dot(grad(v), grad(u)) - lmbda * v * u) * dx
L = v * f * dx
# Solve the resulting finite-element equation
p = Function(V)
*solve(a == L, p)
???
Now that we've discussed the concepts, let's look at how to drive FE computations in Firedrake
This slide shows the code for solving the Helmholtz equation and should be familiar to any DOLFIN user modulo the import from Firedrake.
- Firedrake always solves nonlinear problems in resdiual form
F(u;v) = 0
- Transform linear problem into residual form:
J = a F = ufl.action(J, u) - L
- Jacobian known to be
a
- Always solved in a single Newton (nonlinear) iteration
- Jacobian known to be
- Use Newton-like methods from PETSc SNES (optimised C library)
- PETSc SNES requires two callbacks to evaluate residual and Jacobian:
- implemented as Python functiones (supported by petsc4py)
- evaluate residual by assembling residual form
assemble(F, tensor=F_tensor)
- evaluate Jacobian by assembling Jacobian form
assemble(J, tensor=J_tensor, bcs=bcs)
assemble
invokes PyOP2 with kernels generated fromF
andJ
???
-
What happens when the final solve is called on the previous slide?
-
Unified solving interface even behind the scenes
-
If Jacobian not provided by the user, Firedrake uses automatic differentiation:
J = ufl.derivative(F, u)
-
Kernels generated by FFC and executed as PyOP2 parallel loops
- Firedrake builds PyOP2 parallel loop call, using FFC-generated kernel
- iterate over cells (for cell integrals) or facets (interior/exterior facet integrals)
- output tensor depends on rank of the form (PyOP2
Mat
,Dat
orGlobal
) - input arguments: coordinate field and any coefficients present in the form
- No computationally expensive computations (inner loops) in pure Python
- Call optimised libraries where possible (PETSc)
- Expensive library code implemented in Cython (sparsity builder)
- Kernel application over the mesh in natively generated code
- Python is not just glue!
- Firedrake
- Assembled operators
- Function spaces cached on meshes
- FFC-generated kernel code
- PyOP2
Maps
cached onSets
- Sparsity patterns
- JIT-compiled code
- Only data isn't cached (
Function
/Dat
)
class: center, middle
.left70[
from firedrake import *
mesh = Mesh("wave_tank.msh")
V = FunctionSpace(mesh, 'Lagrange', 1)
p = Function(V, name="p")
phi = Function(V, name="phi")
u = TrialFunction(V)
v = TestFunction(V)
p_in = Constant(0.0)
# Boundary condition for y=0
bc = DirichletBC(V, p_in, 1)
T = 10.
dt = 0.001
t = 0
while t <= T:
p_in.assign(sin(2*pi*5*t))
phi -= dt / 2 * p
p += (assemble(dt * inner(grad(v), grad(phi))*dx) /
assemble(v*dx))
bc.apply(p)
phi -= dt / 2 * p
t += dt
]
.right30[
2nd order PDE:
$$\frac{\partial^2\phi}{\partial t^2} - \nabla^2 \phi = 0$$
Formulation with 1st order time derivatives:
$$\frac{\partial\phi}{\partial t} = - p$$
$$\frac{\partial p}{\partial t} + \nabla^2 \phi = 0$$
]
- PyOP2 https://github.com/OP2/PyOP2
- PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes Florian Rathgeber, Graham R. Markall, Lawrence Mitchell, Nicholas Loriant, David A. Ham, Carlo Bertolli, Paul H.J. Kelly, WOLFHPC 2012
- Performance-Portable Finite Element Assembly Using PyOP2 and FEniCS Graham R. Markall, Florian Rathgeber, Lawrence Mitchell, Nicolas Loriant, Carlo Bertolli, David A. Ham, Paul H. J. Kelly , ISC 2013
- Firedrake https://github.com/firedrakeproject/firedrake
- Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly Fabio Luporini, Ana Lucia Varbanescu, Florian Rathgeber, Gheorghe-Teodor Bercea, J. Ramanujam, David A. Ham, Paul H. J. Kelly, ACM Transactions on Architecture and Code Optimization, January 2015
- UFL https://github.com/firedrakeproject/ufl
- FFC https://github.com/firedrakeproject/ffc
This talk is available at https://kynan.github.io/EuroSciPy2014 (source)
Contact: Florian Rathgeber, @frathgeber
Slides created with remark