diff --git a/doc/docs/Parallel_Meep.md b/doc/docs/Parallel_Meep.md index 7ac5bc5fe..8ac9c5ec1 100644 --- a/doc/docs/Parallel_Meep.md +++ b/doc/docs/Parallel_Meep.md @@ -63,6 +63,37 @@ For comparison, consider the scenario where the optimization runs on just a sing Note: for optimization studies involving *random* initial conditions, the seed of the random number generator must be specified otherwise each process will have a different initial condition which will cause a crash. For example, if you are initializing the design variables with `numpy.random.rand`, then you should call `numpy.random.seed(...)` to set the same `numpy.random` seed on every process. +### User-Specified Cell Partition + +An alternative to having Meep automatically partition the cell at runtime into chunks based on the number of MPI processes is to manually specify the cell partition via the `chunk_layout` parameter of the `Simulation` constructor as a [`BinaryPartition`](Python_User_Interface.md#binarypartition) class object. This is based on representing an arbitrary cell partition as a binary tree for which the nodes define "cuts" at a given point (e.g., -4.5, 6.3) along a given cell direction and the leaves define a integer-valued process ID (equivalent to the rank of the MPI process for that chunk). Note also that the same process ID can be assigned to as many chunks as you want, which just means that one process timesteps multiple chunks. If you use fewer MPI processes, then the process ID is taken modulo the number of processes. + +As a demonstration, an example of a 2d cell partition along with its binary-tree representation is shown below. The 10×5 cell in $xy$ coordinates with origin at the lower-left boundary is partitioned into five chunks numbered one through five. + +
+![](images/chunk_division_binary_tree.png) +
+ +This binary tree can be described as a list of lists where each list entry is `[ (split_dir,split_pos), left, right ]` for which `split_dir` and `split_pos` define the splitting direction and position, and `left` and `right` are the left and right branches which can be either another list defining a new node or a process ID. Based on these specifications, the cell partition from above can be set up as follows: + +```py +import meep as mp +import matplotlib.pyplot as plt + +chunk_layout = mp.BinaryPartition(data=[ (mp.X,-2.0), 0, [ (mp.Y,1.5), + [ (mp.X,3.0), 1, [ (mp.Y,-0.5), 4, 3 ] ], 2 ] ]) + +cell_size = mp.Vector3(10.0,5.0,0) + +sim = mp.Simulation(cell_size=cell_size, + resolution=10, + chunk_layout=chunk_layout) + +sim.visualize_chunks() +plt.savefig('chunk_layout.png',dpi=150,bbox_inches='tight') +``` + +For improved performance, we recommend ordering the process IDs in [depth-first order](https://en.wikipedia.org/wiki/Depth-first_search) of the tree, which will tend to give spatially adjacent chunks nearby process IDs. This increases the chance that adjacent chunks are on the same MPI node, improving communication speeds. + Technical Details ----------------- diff --git a/doc/docs/Python_User_Interface.md b/doc/docs/Python_User_Interface.md index 0bc74903e..29b5e8a46 100644 --- a/doc/docs/Python_User_Interface.md +++ b/doc/docs/Python_User_Interface.md @@ -320,10 +320,12 @@ Python. `Vector3` is a `meep` class. `mp.dump_structure`. Defaults to an empty string. See [Load and Dump Structure](#load-and-dump-structure) for more information. -+ **`chunk_layout` [`string` or `Simulation` instance]** — This will cause the - `Simulation` to use the chunk layout described by either an h5 file (created by - `Simulation.dump_chunk_layout`) or another `Simulation`. See [Load and Dump - Structure](#load-and-dump-structure) for more information. ++ **`chunk_layout` [`string` or `Simulation` instance or `BinaryPartition` class]** — + This will cause the `Simulation` to use the chunk layout described by either + (1) an `.h5` file (created using `Simulation.dump_chunk_layout`), (2) another + `Simulation` instance, or (3) a [`BinaryPartition`](#binarypartition) class object. + For more information, see [Load and Dump Structure](#load-and-dump-structure) and + [Parallel Meep/User-Specified Cell Partition](Parallel_Meep.md#user-specified-cell-partition). The following require a bit more understanding of the inner workings of Meep to use. See also [SWIG Wrappers](#swig-wrappers). @@ -338,9 +340,9 @@ use. See also [SWIG Wrappers](#swig-wrappers). initialized by `init_sim()` which is called automatically by any of the [run functions](#run-functions). -+ **`num_chunks` [`integer`]** — Minimum number of "chunks" (subarrays) to divide - the structure/fields into (default 0). Actual number is determined by number of - processors, PML layers, etcetera. Mainly useful for debugging. ++ **`num_chunks` [`integer`]** — Minimum number of "chunks" (subregions) to divide + the structure/fields into. Overrides the default value determined by + the number of processors, PML layers, etcetera. Mainly useful for debugging. + **`split_chunks_evenly` [`boolean`]** — When `True` (the default), the work per [chunk](Chunks_and_Symmetry.md) is not taken into account when splitting chunks @@ -7288,6 +7290,53 @@ former value. +--- + + +### BinaryPartition + +```python +class BinaryPartition(object): +``` + +
+ +Binary tree class used for specifying a cell partition of arbitrary sized chunks for use as the +`chunk_layout` parameter of the `Simulation` class object. + +
+ + + + + +
+ +```python +def __init__(self, + data=None, + split_dir=None, + split_pos=None, + left=None, + right=None, + proc_id=None): +``` + +
+ +The constructor accepts three separate groups of arguments: (1) `data`: a list of lists where each +list entry is either (a) a node defined as `[ (split_dir,split_pos), left, right ]` for which `split_dir` +and `split_pos` define the splitting direction (i.e., `mp.X`, `mp.Y`, `mp.Z`) and position (e.g., `3.5`, +`-4.2`, etc.) and `left` and `right` are the two branches (themselves `BinaryPartition` objects) +or (b) a leaf with integer value for the process ID `proc_id` in the range between 0 and number of processes +- 1 (inclusive), (2) a node defined using `split_dir`, `split_pos`, `left`, and `right`, or (3) a leaf with +`proc_id`. Note that the same process ID can be assigned to as many chunks as you want, which means that one +process timesteps multiple chunks. If you use fewer MPI processes, then the process ID is taken modulo the number +of processes. + +
+ +
Miscellaneous Functions Reference --------------------------------- diff --git a/doc/docs/Python_User_Interface.md.in b/doc/docs/Python_User_Interface.md.in index f77f50ea9..199429b35 100644 --- a/doc/docs/Python_User_Interface.md.in +++ b/doc/docs/Python_User_Interface.md.in @@ -822,6 +822,7 @@ The following classes are available directly via the `meep` package. @@ Verbosity.get @@ @@ Verbosity.set @@ +@@ BinaryPartition[methods-with-docstrings] @@ Miscellaneous Functions Reference --------------------------------- diff --git a/doc/docs/images/chunk_division_binary_tree.png b/doc/docs/images/chunk_division_binary_tree.png new file mode 100644 index 000000000..567c10db0 Binary files /dev/null and b/doc/docs/images/chunk_division_binary_tree.png differ diff --git a/python/Makefile.am b/python/Makefile.am index 40972145d..8d1e485db 100644 --- a/python/Makefile.am +++ b/python/Makefile.am @@ -44,6 +44,7 @@ TESTS = \ $(BINARY_GRATING_TEST) \ $(TEST_DIR)/cavity_arrayslice.py \ $(TEST_DIR)/cavity_farfield.py \ + $(TEST_DIR)/chunk_layout.py \ $(TEST_DIR)/chunks.py \ $(TEST_DIR)/cyl_ellipsoid.py \ $(TEST_DIR)/dft_energy.py \ diff --git a/python/meep.i b/python/meep.i index 83abbea01..074df1c98 100644 --- a/python/meep.i +++ b/python/meep.i @@ -1439,6 +1439,27 @@ void _get_gradient(PyObject *grad, PyObject *fields_a, PyObject *fields_f, PyObj } } +// typemaps for binary_partition + +%typecheck (SWIG_TYPECHECK_POINTER) meep::binary_partition * { + $1 = PyObject_IsInstance($input, py_binary_partition_object()); +} + +%typemap(in) meep::binary_partition * { + $1 = py_bp_to_bp($input); + if(!$1) { + SWIG_fail; + } +} + +%typemap(arginit) meep::binary_partition * { + $1 = NULL; +} + +%typemap(freearg) meep::binary_partition * { + delete $1; +} + // Tells Python to take ownership of the h5file* this function returns so that // it gets garbage collected and the file gets closed. %newobject meep::fields::open_h5file; @@ -1659,6 +1680,7 @@ PyObject *_get_array_slice_dimensions(meep::fields *f, const meep::volume &where ) from .simulation import ( Absorber, + BinaryPartition, Ldos, EnergyRegion, FluxRegion, diff --git a/python/simulation.py b/python/simulation.py index 6d3bcb6be..e708ddd33 100644 --- a/python/simulation.py +++ b/python/simulation.py @@ -1162,10 +1162,12 @@ def __init__(self, `mp.dump_structure`. Defaults to an empty string. See [Load and Dump Structure](#load-and-dump-structure) for more information. - + **`chunk_layout` [`string` or `Simulation` instance]** — This will cause the - `Simulation` to use the chunk layout described by either an h5 file (created by - `Simulation.dump_chunk_layout`) or another `Simulation`. See [Load and Dump - Structure](#load-and-dump-structure) for more information. + + **`chunk_layout` [`string` or `Simulation` instance or `BinaryPartition` class]** — + This will cause the `Simulation` to use the chunk layout described by either + (1) an `.h5` file (created using `Simulation.dump_chunk_layout`), (2) another + `Simulation` instance, or (3) a [`BinaryPartition`](#binarypartition) class object. + For more information, see [Load and Dump Structure](#load-and-dump-structure) and + [Parallel Meep/User-Specified Cell Partition](Parallel_Meep.md#user-specified-cell-partition). The following require a bit more understanding of the inner workings of Meep to use. See also [SWIG Wrappers](#swig-wrappers). @@ -1180,9 +1182,9 @@ def __init__(self, initialized by `init_sim()` which is called automatically by any of the [run functions](#run-functions). - + **`num_chunks` [`integer`]** — Minimum number of "chunks" (subarrays) to divide - the structure/fields into (default 0). Actual number is determined by number of - processors, PML layers, etcetera. Mainly useful for debugging. + + **`num_chunks` [`integer`]** — Minimum number of "chunks" (subregions) to divide + the structure/fields into. Overrides the default value determined by + the number of processors, PML layers, etcetera. Mainly useful for debugging. + **`split_chunks_evenly` [`boolean`]** — When `True` (the default), the work per [chunk](Chunks_and_Symmetry.md) is not taken into account when splitting chunks @@ -1207,7 +1209,7 @@ def __init__(self, self.extra_materials = extra_materials self.default_material = default_material self.epsilon_input_file = epsilon_input_file - self.num_chunks = num_chunks + self.num_chunks = chunk_layout.numchunks() if isinstance(chunk_layout,mp.BinaryPartition) else num_chunks self.Courant = Courant self.global_d_conductivity = 0 self.global_b_conductivity = 0 @@ -1878,8 +1880,10 @@ def load_chunk_layout(self, br, source): if isinstance(source, Simulation): vols = source.structure.get_chunk_volumes() - self.structure.load_chunk_layout(vols, br) + ids = source.structure.get_chunk_owners() + self.structure.load_chunk_layout(vols, [int(f) for f in ids], br) else: + ## source is either filename (string) or BinaryPartition class object self.structure.load_chunk_layout(source, br) def init_sim(self): @@ -5171,3 +5175,54 @@ def merge_subgroup_data(data): comm.Alltoallv(smsg, rmsg) return output + +class BinaryPartition(object): + """ + Binary tree class used for specifying a cell partition of arbitrary sized chunks for use as the + `chunk_layout` parameter of the `Simulation` class object. + """ + def __init__(self, data=None, split_dir=None, split_pos=None, left=None, right=None, proc_id=None): + """ + The constructor accepts three separate groups of arguments: (1) `data`: a list of lists where each + list entry is either (a) a node defined as `[ (split_dir,split_pos), left, right ]` for which `split_dir` + and `split_pos` define the splitting direction (i.e., `mp.X`, `mp.Y`, `mp.Z`) and position (e.g., `3.5`, + `-4.2`, etc.) and `left` and `right` are the two branches (themselves `BinaryPartition` objects) + or (b) a leaf with integer value for the process ID `proc_id` in the range between 0 and number of processes + - 1 (inclusive), (2) a node defined using `split_dir`, `split_pos`, `left`, and `right`, or (3) a leaf with + `proc_id`. Note that the same process ID can be assigned to as many chunks as you want, which means that one + process timesteps multiple chunks. If you use fewer MPI processes, then the process ID is taken modulo the number + of processes. + """ + self.split_dir = None + self.split_pos = None + self.proc_id = None + self.left = None + self.right = None + if data is not None: + if isinstance(data,list) and len(data) == 3: + if isinstance(data[0],tuple) and len(data[0]) == 2: + self.split_dir = data[0][0] + self.split_pos = data[0][1] + else: + raise ValueError("expecting 2-tuple (split_dir,split_pos) but got {}".format(data[0])) + self.left = BinaryPartition(data=data[1]) + self.right = BinaryPartition(data=data[2]) + elif isinstance(data,int): + self.proc_id = data + else: + raise ValueError("expecting list [(split_dir,split_pos), left, right] or int (proc_id) but got {}".format(data)) + elif split_dir is not None: + self.split_dir = split_dir + self.split_pos = split_pos + self.left = left + self.right = right + else: + self.proc_id = proc_id + + def _numchunks(self,bp): + if bp is None: + return 0 + return max(self._numchunks(bp.left)+self._numchunks(bp.right), 1) + + def numchunks(self): + return self._numchunks(self) diff --git a/python/tests/chunk_layout.py b/python/tests/chunk_layout.py new file mode 100644 index 000000000..b88951839 --- /dev/null +++ b/python/tests/chunk_layout.py @@ -0,0 +1,59 @@ +import meep as mp +import copy +import unittest + +process_ids = [] +chunk_areas = [] + +def traverse_tree(bp=None,min_corner=None,max_corner=None): + if ((min_corner.x > max_corner.x) or (min_corner.y > max_corner.y)): + raise RuntimeError("min_corner/max_corner have been incorrectly defined.") + + ## reached a leaf + if (bp.left is None and bp.right is None): + process_ids.append(bp.proc_id) + chunk_area = (max_corner.x-min_corner.x)*(max_corner.y-min_corner.y) + chunk_areas.append(chunk_area) + + ## traverse the left branch + if (bp.left is not None): + new_max_corner = copy.deepcopy(max_corner) + if bp.split_dir == mp.X: + new_max_corner.x = bp.split_pos + else: + new_max_corner.y = bp.split_pos + traverse_tree(bp.left,min_corner,new_max_corner) + + ## traverse the right branch + if (bp.right is not None): + new_min_corner = copy.deepcopy(min_corner) + if bp.split_dir == mp.X: + new_min_corner.x = bp.split_pos + else: + new_min_corner.y = bp.split_pos + traverse_tree(bp.right,new_min_corner,max_corner) + + +class TestChunkLayoutBinaryPartition(unittest.TestCase): + + def test_chunk_layout_binary_partition(self): + chunk_layout = mp.BinaryPartition(data=[ (mp.X,-2.0), 0, [ (mp.Y,1.5), [ (mp.X,3.0), 1, [ (mp.Y,-0.5), 4, 3 ] ], 2 ] ]) + + cell_size = mp.Vector3(10.0,5.0,0) + + sim = mp.Simulation(cell_size=cell_size, + resolution=10, + num_chunks=5, + chunk_layout=chunk_layout) + + sim.init_sim() + owners = sim.structure.get_chunk_owners() + areas = [ v.surroundings().full_volume() for v in sim.structure.get_chunk_volumes() ] + + traverse_tree(chunk_layout,-0.5*cell_size,0.5*cell_size) + + self.assertListEqual([int(f) for f in owners],[f % mp.count_processors() for f in process_ids]) + self.assertListEqual(areas,chunk_areas) + +if __name__ == '__main__': + unittest.main() diff --git a/python/typemap_utils.cpp b/python/typemap_utils.cpp index f0606d2d8..77cdc9631 100644 --- a/python/typemap_utils.cpp +++ b/python/typemap_utils.cpp @@ -1029,3 +1029,49 @@ static PyObject *gobj_list_to_py_list(geometric_object_list *objs) { return py_res; } + +static meep::binary_partition *py_bp_to_bp(PyObject *pybp) { + meep::binary_partition *bp = NULL; + if (pybp == Py_None) return bp; + + PyObject *id = PyObject_GetAttrString(pybp, "proc_id"); + PyObject *split_dir = PyObject_GetAttrString(pybp, "split_dir"); + PyObject *split_pos = PyObject_GetAttrString(pybp, "split_pos"); + PyObject *left = PyObject_GetAttrString(pybp, "left"); + PyObject *right = PyObject_GetAttrString(pybp, "right"); + + if (!id || !split_dir || !split_pos || !left || !right) { + meep::abort("BinaryPartition class object is incorrectly defined."); + } + + if (PyLong_Check(id)) { + bp = new meep::binary_partition(PyLong_AsLong(id)); + } else { + bp = new meep::binary_partition(direction(PyLong_AsLong(split_dir)), PyFloat_AsDouble(split_pos)); + bp->left = py_bp_to_bp(left); + bp->right = py_bp_to_bp(right); + } + + Py_XDECREF(id); + Py_XDECREF(split_dir); + Py_XDECREF(split_pos); + Py_XDECREF(left); + Py_XDECREF(right); + return bp; +} + +static PyObject *get_meep_mod() { + // Return value: Borrowed reference + static PyObject *meep_mod = NULL; + if (meep_mod == NULL) { meep_mod = PyImport_ImportModule("meep"); } + return meep_mod; +} + +static PyObject *py_binary_partition_object() { + // Return value: Borrowed reference + static PyObject *bp_type = NULL; + if (bp_type == NULL) { + bp_type = PyObject_GetAttrString(get_meep_mod(), "BinaryPartition"); + } + return bp_type; +} diff --git a/src/meep.hpp b/src/meep.hpp index 9848c1387..079405496 100644 --- a/src/meep.hpp +++ b/src/meep.hpp @@ -707,6 +707,8 @@ boundary_region pml(double thickness, direction d, double Rasymptotic = 1e-15, boundary_region pml(double thickness, double Rasymptotic = 1e-15, double mean_stretch = 1.0); #define no_pml() boundary_region() +class binary_partition; + class structure { public: structure_chunk **chunks; @@ -773,7 +775,10 @@ class structure { void dump_chunk_layout(const char *filename); void load(const char *filename); void load_chunk_layout(const char *filename, boundary_region &br); - void load_chunk_layout(const std::vector &gvs, boundary_region &br); + void load_chunk_layout(const binary_partition *bp, boundary_region &br); + void load_chunk_layout(const std::vector &gvs, + const std::vector &ids, + boundary_region &br); // monitor.cpp std::complex get_chi1inv(component, direction, const ivec &origloc, double frequency = 0, @@ -2150,6 +2155,17 @@ vec get_k(void *vedata); realnum linear_interpolate(realnum rx, realnum ry, realnum rz, realnum *data, int nx, int ny, int nz, int stride); +// binary tree class for importing layout of chunk partition +class binary_partition { +public: + binary_partition(int _id); + binary_partition(direction _split_dir, double _split_pos); + direction split_dir; + double split_pos; + int proc_id; + binary_partition *left, *right; +}; + } /* namespace meep */ #endif /* MEEP_H */ diff --git a/src/structure_dump.cpp b/src/structure_dump.cpp index a616ae8db..40d959545 100644 --- a/src/structure_dump.cpp +++ b/src/structure_dump.cpp @@ -414,6 +414,55 @@ void structure::set_chiP_from_file(h5file *file, const char *dataset, field_type } } +binary_partition::binary_partition(int _proc_id) { + proc_id = _proc_id; + split_dir = NO_DIRECTION; + split_pos = 0.0; + left = NULL; + right = NULL; +} + +binary_partition::binary_partition(direction _split_dir, double _split_pos) { + split_dir = _split_dir; + split_pos = _split_pos; + proc_id = -1; + left = NULL; + right = NULL; +} + +static void split_by_binarytree(grid_volume gvol, + std::vector &result_gvs, + std::vector &result_ids, + const binary_partition *bp) { + // reached a leaf + if ((bp->left == NULL) && (bp->right == NULL)) { + result_gvs.push_back(gvol); + result_ids.push_back(bp->proc_id); + return; + } + + int split_point = (size_t)((bp->split_pos - gvol.surroundings().in_direction_min(bp->split_dir)) / + gvol.surroundings().in_direction(bp->split_dir) * + gvol.num_direction(bp->split_dir) + 0.5); + // traverse left branch + if (bp->left != NULL) { + grid_volume left_gvol = gvol.split_at_fraction(false, split_point, bp->split_dir); + split_by_binarytree(left_gvol, result_gvs, result_ids, bp->left); + } + // traverse right branch + if (bp->right != NULL) { + grid_volume right_gvol = gvol.split_at_fraction(true, split_point, bp->split_dir); + split_by_binarytree(right_gvol, result_gvs, result_ids, bp->right); + } +} + +void structure::load_chunk_layout(const binary_partition *bp, boundary_region &br) { + std::vector gvs; + std::vector ids; + split_by_binarytree(gv, gvs, ids, bp); + load_chunk_layout(gvs, ids, br); +} + void structure::load_chunk_layout(const char *filename, boundary_region &br) { // Load chunk grid_volumes from a file h5file file(filename, h5file::READONLY, true); @@ -446,6 +495,7 @@ void structure::load_chunk_layout(const char *filename, boundary_region &br) { broadcast(0, nums, sz); std::vector gvs; + std::vector ids; // Populate a vector with the new grid_volumes for (int i = 0; i < num_chunks; ++i) { int idx = i * 3; @@ -457,20 +507,23 @@ void structure::load_chunk_layout(const char *filename, boundary_region &br) { } new_gv.set_origin(new_origin); gvs.push_back(new_gv); + ids.push_back(i * count_processors() / num_chunks); } - load_chunk_layout(gvs, br); + load_chunk_layout(gvs, ids, br); delete[] origins; delete[] nums; } -void structure::load_chunk_layout(const std::vector &gvs, boundary_region &br) { +void structure::load_chunk_layout(const std::vector &gvs, + const std::vector &ids, + boundary_region &br) { + if (gvs.size() != num_chunks) abort("load_chunk_layout: wrong number of chunks."); // Recreate the chunks with the new grid_volumes for (int i = 0; i < num_chunks; ++i) { if (chunks[i]->refcount-- <= 1) delete chunks[i]; - int proc = i * count_processors() / num_chunks; - chunks[i] = new structure_chunk(gvs[i], v, Courant, proc); + chunks[i] = new structure_chunk(gvs[i], v, Courant, ids[i] % count_processors()); br.apply(this, chunks[i]); } check_chunks();