diff --git a/doc/docs/Build_From_Source.md b/doc/docs/Build_From_Source.md
index 2b679fc2a..714a1ab4a 100644
--- a/doc/docs/Build_From_Source.md
+++ b/doc/docs/Build_From_Source.md
@@ -244,6 +244,25 @@ By default, Meep's configure script tries to guess the gcc `-march` flag for the
—
This flag enables some experimental support for [OpenMP](https://en.wikipedia.org/wiki/OpenMP) multithreading parallelism on multi-core machines (*instead* of MPI, or in addition to MPI if you have multiple processor cores per MPI process). Currently, only multi-frequency [`near2far`](Python_User_Interface.md#near-to-far-field-spectra) calculations are sped up this way, but in the future this [may be expanded](https://github.com/NanoComp/meep/issues/228) with additional OpenMP parallelism. When you run Meep, you can first set the `OMP_NUM_THREADS` environment variable to the number of threads you want OpenMP to use.
+### Floating-Point Precision of the Fields and Materials Arrays
+
+By default, the C/C++ arrays used in Meep to store the time-domain fields ($\mathbf{E}$, $\mathbf{D}$, $\mathbf{H}$) and materials ($\varepsilon$) are defined using [double-precision floating point](https://en.wikipedia.org/wiki/Double-precision_floating-point_format). Updating the fields arrays generally dominates the computational cost of the simulation because it occurs at every voxel in the cell and at every timestep. Because [discretization errors](https://en.wikipedia.org/wiki/Discretization_error) which include the discontinuous material interfaces (as described in [Subpixel Smoothing](Subpixel_Smoothing.md)) as well as the [numerical dispersion](https://en.wikipedia.org/wiki/Numerical_dispersion) of the Yee grid typically dominates the [floating-point roundoff error](https://en.wikipedia.org/wiki/Round-off_error), the fields/materials arrays can be defined using [single-precision floating point](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) to provide a significant speedup (by reducing the [memory bandwidth](https://en.wikipedia.org/wiki/Memory_bandwidth)) often without *any loss* in simulation accuracy.
+
+This feature requires manually setting the [macro `MEEP_SINGLE` in the source file `meep.hpp`](https://github.com/NanoComp/meep/blob/master/src/meep.hpp#L37) to `1` before compiling:
+
+```cpp
+#define MEEP_SINGLE 1 // 1 for single precision, 0 for double
+#if MEEP_SINGLE
+typedef float realnum;
+#else
+typedef double realnum;
+#endif
+```
+
+As a demonstration of the potential improvement in runtime performance, for an experiment involving [computing the light-extraction efficiency of an OLED](http://www.simpetus.com/projects.html#meep_oled) which includes PMLs, DFT flux monitors, and Lorentzian susceptibilities, the timestepping rate (s/step) for the single-precision case using 20 MPI processes was ~50% that of double precision.
+
+The DFT fields, however, are always defined using double-precision floating point. This is intended to mitigate the accumulation of round-off error for simulations with a large number of timesteps.
+
### Separating Build and Source Paths
Meep supports ["VPATH" builds](https://www.gnu.org/software/automake/manual/html_node/VPATH-Builds.html), where you compile in a separate directory from the source directory. This is helpful if you want to keep the source directory in a pristine state, or if you want to build multiple binaries simultaneously from the same source tree. Just create a build directory and execute the `configure` script by supplying its path, for example
diff --git a/doc/docs/Parallel_Meep.md b/doc/docs/Parallel_Meep.md
index c70db0ac1..2c84fce84 100644
--- a/doc/docs/Parallel_Meep.md
+++ b/doc/docs/Parallel_Meep.md
@@ -65,7 +65,7 @@ Note: for optimization studies involving *random* initial conditions, the seed o
### User-Specified Cell Partition
-An alternative to having Meep automatically partition the cell at runtime into chunks based on the number of MPI processes is to manually specify the cell partition via the `chunk_layout` parameter of the `Simulation` constructor as a [`BinaryPartition`](Python_User_Interface.md#binarypartition) class object. This is based on representing an arbitrary cell partition as a [binary tree](https://en.wikipedia.org/wiki/Binary_tree) for which the nodes define "cuts" at a given point (e.g., -4.5, 6.3) along a given cell direction and the leaves define an integer-valued process ID (equivalent to the rank of the MPI process for that chunk). Note also that the same process ID can be assigned to as many chunks as you want, which just means that one process timesteps multiple chunks. If you use fewer MPI processes, then the process ID is taken modulo the number of processes.
+An alternative to having Meep automatically partition the cell at runtime into chunks based on the number of MPI processes is to manually specify the cell partition via the `chunk_layout` parameter of the `Simulation` constructor as a [`BinaryPartition`](Python_User_Interface.md#binarypartition) class object. This is based on representing an arbitrary cell partition as a [binary tree](https://en.wikipedia.org/wiki/Binary_tree) for which the nodes define "cuts" at a given point (e.g., -4.5, 6.3) along a given cell direction and the leaves define an integer-valued process ID (equivalent to the rank of the MPI process for that chunk). Note also that the same process ID can be assigned to as many chunks as you want, which just means that one process timesteps multiple chunks. If you use fewer MPI processes, then the process ID is taken modulo the number of MPI processes. If you use more MPI processes than there are chunks, then those MPI processes that are not assigned to chunks will just remain idle.
As a demonstration, an example of a 2d cell partition along with its binary-tree representation is shown below. The 10×5 cell in $xy$ coordinates with origin at the cell center is partitioned into five chunks numbered one through five.
diff --git a/doc/docs/Subpixel_Smoothing.md b/doc/docs/Subpixel_Smoothing.md
index 248f41e26..86b5393c0 100644
--- a/doc/docs/Subpixel_Smoothing.md
+++ b/doc/docs/Subpixel_Smoothing.md
@@ -35,7 +35,7 @@ In [parallel simulations](Parallel_Meep.md), each [chunk](Chunks_and_Symmetry.md
Continuously Varying Shapes and Results
---------------------------------------
-A key feature of Meep's subpixel smoothing, particularly relevant for shape optimization (i.e., [Applied Physics Letters, Vol. 104, 091121, 2014](https://aip.scitation.org/doi/abs/10.1063/1.4867892) ([pdf](http://ab-initio.mit.edu/~oskooi/papers/Oskooi14_tandem.pdf))), is that continuously varying the `geometry` yields continuously varying results. This is demonstrated for a ring resonator: as the radius increases, the frequency of a resonant $H_z$-polarized mode decreases. Note: unlike the example in [Tutorial/Basics/Modes of a Ring Resonator](Python_Tutorials/Basics.md#modes-of-a-ring-resonator) involving $E_z$-polarized modes where the electric fields are always continuous (i.e., parallel to the interface), this example involves discontinuous fields. Also, the ring geometry contains no sharp corners/edges which tend to produce field singularities that degrade the error. The simulation script is shown below. The inner ring radius is varied from 1.8 to 2.0 μm in gradations of 0.005 μm. The ring width is constant (1 μm). The resolution is 10 pixels/μm. The gradations are therefore well below subpixel dimensions.
+A key feature of Meep's subpixel smoothing, particularly relevant for shape optimization (i.e., [Applied Physics Letters, Vol. 104, 091121, 2014](https://aip.scitation.org/doi/abs/10.1063/1.4867892) ([pdf](http://ab-initio.mit.edu/~oskooi/papers/Oskooi14_tandem.pdf))), is that continuously varying the `geometry` yields continuously varying results. This is demonstrated for a ring resonator: as the radius increases, the frequency of a resonant $H_z$-polarized mode decreases. Note: unlike the example in [Tutorial/Basics/Modes of a Ring Resonator](Python_Tutorials/Basics.md#modes-of-a-ring-resonator) involving $E_z$-polarized modes where the electric fields are always continuous (i.e., parallel to the interface), this example involves discontinuous fields. Also, the ring geometry contains no sharp corners/edges which tend to produce field singularities that degrade the error. The simulation script is shown below. The inner ring radius is varied from 1.8 to 2.0 μm in gradations of 0.005 μm. The ring width is constant (1 μm). The resolution is 10 pixels/μm. The gradations are therefore well below pixel dimensions.
```py
import meep as mp
@@ -90,7 +90,7 @@ for rad in np.arange(1.800,2.001,0.005):
sim.reset_meep()
```
-A plot of the resonant frequency versus the ring radius is shown below for subpixel smoothing (red) and no smoothing (blue). Included for reference is the "exact" (black) computed using *no smoothing* at a resolution of 60 pixels/μm. The no-smoothing result shows "staircasing" effects which are artifacts of the discretization. The subpixel-smoothing result varies continuously with the ring radius similar to the high-resolution result which is at a resolution six times larger. The inset shows the scalar $H_z$ field profile of the resonant mode for a structure with inner radius of 1.9 μm.
+A plot of the resonant frequency versus the ring radius is shown below for subpixel smoothing (red) and no smoothing (blue). Included for reference is the "exact" result (black) computed using *no smoothing* at a resolution of 60 pixels/μm. The no-smoothing result shows "staircasing" effects which are artifacts of the discretization. The subpixel-smoothing result varies continuously with the ring radius similar to the high-resolution result which is at a resolution six times larger. The inset shows the scalar $H_z$ field profile of the resonant mode for a structure with inner radius of 1.9 μm.
This particular resonant mode has a [quality (Q) factor](https://en.wikipedia.org/wiki/Q_factor) of ~107 at a frequency of 0.25 and radius of 2.0 μm. This means that roughly 4x107 optical periods are required to accurately resolve the field decay due to the Fourier uncertainty relation. Instead, [`Harminv`](Python_User_Interface.md#harminv) can resolve the $Q$ using just ~1000 periods. This is nearly a four orders of magnitude reduction in the run time.