clarify meaning of meep.am_master when used with meep.divide_parallel…

…_proceses (NanoComp#1443) * clarify meaning of meep.am_master when used with meep.divide_parallel_processes * Update Parallel_Meep.md Co-authored-by: Steven G. Johnson <stevenj@mit.edu>
bencbartlett · Dec 9, 2020 · 8505275 · 8505275
1 parent ae55723
commit 8505275
Showing 1 changed file with 4 additions and 2 deletions.
diff --git a/doc/docs/Parallel_Meep.md b/doc/docs/Parallel_Meep.md
@@ -49,7 +49,9 @@ mpirun -np 4 python -m mpi4py foo.py
 
 Parallel Meep works by taking your simulation and dividing the cell among the MPI processes. This is the only way of parallelizing a single simulation and enables simulating very large problems.
 
-However, there is an alternative strategy for parallelization. If you have many smaller simulations that you want to run, say for many different values of some parameter, then you can just run these as separate jobs. Such parallelization is known as [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) because no communication is required. Additionally, Meep provides explicit support for this mode of operation even when using a *single* MPI job via the `meep.divide_parallel_processes(N)` routine which divides `N` MPI processes into `N` equal subgroups and returns the index `n` (`0`,...,`N-1`) of the current group which can be used to decide which simulation to run. That is, you have one script, and the script only creates *one* simulation object — depending on the value of `n` that it receives, it will create a different simulation object (i.e., using different parameters). Only the fields from the same subgroup communicate using MPI. There is an auxiliary routine `meep.merge_subgroup_data(data)` which takes a NumPy array `data` from every process (which is identical across each subgroup) and then returns an array which is just the concatenated `data` from each subgroup. For an example, see [python/tests/divide_mpi_processes.py](https://github.com/NanoComp/meep/tree/master/python/tests/divide_mpi_processes.py) in the source repository. This feature can be useful for large supercomputers which typically restrict the total number of jobs that can be executed but do not restrict the size of each job. Note that when using this feature using the [Python interface](Python_User_Interface.md), only the output of the subgroup belonging to the master process of the entire simulation is shown in the standard output. (In C++, the master process from *every* subgroup prints to standard output.)
+However, there is an alternative strategy for parallelization. If you have many smaller simulations that you want to run, say for many different values of some parameter, then you can just run these as separate jobs. Such parallelization is known as [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) because no communication is required. Additionally, Meep provides explicit support for this mode of operation even when using a *single* MPI job via the `meep.divide_parallel_processes(N)` routine which divides `N` MPI processes into `N` equal subgroups and returns the index `n` (`0`,...,`N-1`) of the current group which can be used to decide which simulation to run. That is, you have one script, and the script only creates *one* `Simulation` object — depending on the value of `n` that it receives, it will create a different `Simulation` object (i.e., using different parameters). For each subgroup, Meep acts as though it were running a separate self-contained parallel simulation: the fields from a given subgroup communicate only with each other using MPI, and each subgroup has its own master process with rank 0 which can be checked using `meep.am_master()`.    The overall master process of the entire run is the one for which `meep.am_really_master()` returns `True`; this is the only process that can generally perform I/O.  There is an auxiliary routine `meep.merge_subgroup_data(data)` which takes a NumPy array `data` from every process (which is identical across each subgroup) and then returns an array which is just the concatenated `data` from each subgroup. For an example, see [python/tests/divide_mpi_processes.py](https://github.com/NanoComp/meep/tree/master/python/tests/divide_mpi_processes.py) in the source repository.  You can also communicate between subgroups in other ways, e.g. by using low-level `mpi4py` functions.
+
+The `divide_parallel_processes` feature can be useful for large supercomputers which typically restrict the total number of jobs that can be executed but do not restrict the size of each job, or for large-scale optimization where many separate simulations are coupled by an optimization algorithm. Note that when using this feature using the [Python interface](Python_User_Interface.md), only the output of the subgroup belonging to the master process of the entire simulation is shown in the standard output. (In C++, the master process from *every* subgroup prints to standard output.)
 
 Meep also supports [thread-level parallelism](https://en.wikipedia.org/wiki/Task_parallelism) (i.e., multi-threading) on a single, shared-memory, multi-core machine for multi-frequency [near-to-far field](Python_User_Interface.md#near-to-far-field-spectra) computations. Meep does not currently use thread-level parallelism for the time stepping although this feature may be added in the future (see [Issue \#228](https://github.com/NanoComp/meep/issues/228)).
 
@@ -135,4 +137,4 @@ See also [FAQ/Should I expect linear speedup from the parallel Meep](FAQ.md#shou
 
 <center>
 ![](images/parallel_benchmark_DFT.png)
-</center>
+</center>