Skip to content

Commit

Permalink
loop tiling for step_update_edhb (NanoComp#1733)
Browse files Browse the repository at this point in the history
* loop tiling for STEP_UPDATE_EDHB

* reorganization to prevent copying the PML auxiliary fields for every tile

* move loop over tiles outside of loop over components and skip tiling if material has diagonal s->chi1inv

* add two separate subdomains for step_db and update_eh and

* always tile step_db but only tile update_eh for anisotropic materials

* fixes and documentation
  • Loading branch information
oskooi authored and mawc2019 committed Nov 3, 2021
1 parent 4ff73cd commit bbedb30
Show file tree
Hide file tree
Showing 6 changed files with 116 additions and 76 deletions.
14 changes: 8 additions & 6 deletions doc/docs/Python_User_Interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ def __init__(self,
progress_interval=4,
subpixel_tol=0.0001,
subpixel_maxeval=100000,
loop_tile_base=0,
loop_tile_base_db=0,
loop_tile_base_eh=0,
ensure_periodicity=True,
num_chunks=0,
Courant=0.5,
Expand Down Expand Up @@ -286,11 +287,12 @@ Python. `Vector3` is a `meep` class.
the minimum refractive index (usually 1), and in practice $S$ should be slightly
smaller.

+ **`loop_tile_base` [`number`]** — To improve the memory locality of the step-curl
field updates, Meep has an experimental feature to "tile" the loop over the Yee grid
voxels. The splitting of this loop into tiles or subdomains involves a recursive-bisection
method in which the base case for the number of voxels is specified using this parameter.
The default value is 0 or no tiling; a typical nonzero value to try would be `10000`.
+ **`loop_tile_base_db`, `loop_tile_base_eh` [`number`]** — To improve the [memory locality](https://en.wikipedia.org/wiki/Locality_of_reference)
of the field updates, Meep has an experimental feature to "tile" the loops over the Yee grid
voxels. The splitting of the update loops for step-curl and update-eh into tiles or subdomains
involves a recursive-bisection method in which the base case for the number of voxels is
specified using these two parameters, respectively. The default value is 0 or no tiling;
a typical nonzero value to try would be 10000.

+ **`output_volume` [`Volume` class ]** — Specifies the default region of space
that is output by the HDF5 output functions (below); see also the `Volume` class
Expand Down
19 changes: 11 additions & 8 deletions python/simulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -953,7 +953,8 @@ def __init__(self,
progress_interval=4,
subpixel_tol=1e-4,
subpixel_maxeval=100000,
loop_tile_base=0,
loop_tile_base_db=0,
loop_tile_base_eh=0,
ensure_periodicity=True,
num_chunks=0,
Courant=0.5,
Expand Down Expand Up @@ -1129,11 +1130,12 @@ def __init__(self,
the minimum refractive index (usually 1), and in practice $S$ should be slightly
smaller.
+ **`loop_tile_base` [`number`]** — To improve the memory locality of the step-curl
field updates, Meep has an experimental feature to "tile" the loop over the Yee grid
voxels. The splitting of this loop into tiles or subdomains involves a recursive-bisection
method in which the base case for the number of voxels is specified using this parameter.
The default value is 0 or no tiling; a typical nonzero value to try would be 10000.
+ **`loop_tile_base_db`, `loop_tile_base_eh` [`number`]** — To improve the [memory locality](https://en.wikipedia.org/wiki/Locality_of_reference)
of the field updates, Meep has an experimental feature to "tile" the loops over the Yee grid
voxels. The splitting of the update loops for step-curl and update-eh into tiles or subdomains
involves a recursive-bisection method in which the base case for the number of voxels is
specified using these two parameters, respectively. The default value is 0 or no tiling;
a typical nonzero value to try would be 10000.
+ **`output_volume` [`Volume` class ]** — Specifies the default region of space
that is output by the HDF5 output functions (below); see also the `Volume` class
Expand Down Expand Up @@ -1206,7 +1208,8 @@ def __init__(self,
self.eps_averaging = eps_averaging
self.subpixel_tol = subpixel_tol
self.subpixel_maxeval = subpixel_maxeval
self.loop_tile_base = loop_tile_base
self.loop_tile_base_db = loop_tile_base_db
self.loop_tile_base_eh = loop_tile_base_eh
self.ensure_periodicity = ensure_periodicity
self.extra_materials = extra_materials
self.default_material = default_material
Expand Down Expand Up @@ -1963,7 +1966,7 @@ def init_sim(self):
self.m if self.is_cylindrical else 0,
self.k_point.z if self.special_kz and self.k_point else 0,
not self.accurate_fields_near_cylorigin,
self.loop_tile_base
self.loop_tile_base_db, self.loop_tile_base_eh
)

if self.force_all_components and self.dimensions != 1:
Expand Down
26 changes: 15 additions & 11 deletions src/fields.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ using namespace std;
namespace meep {

fields::fields(structure *s, double m, double beta, bool zero_fields_near_cylorigin,
int loop_tile_base)
int loop_tile_base_db, int loop_tile_base_eh)
: S(s->S), gv(s->gv), user_volume(s->user_volume), v(s->v), m(m), beta(beta),
loop_tile_base_db(loop_tile_base_db), loop_tile_base_eh(loop_tile_base_eh),
working_on(&times_spent) {
shared_chunks = s->shared_chunks;
components_allocated = false;
Expand Down Expand Up @@ -59,7 +60,7 @@ fields::fields(structure *s, double m, double beta, bool zero_fields_near_cylori
chunks = new fields_chunk_ptr[num_chunks];
for (int i = 0; i < num_chunks; i++)
chunks[i] = new fields_chunk(s->chunks[i], outdir, m, beta,
zero_fields_near_cylorigin, i, loop_tile_base);
zero_fields_near_cylorigin, i, loop_tile_base_db);
FOR_FIELD_TYPES(ft) {
typedef realnum *realnum_ptr;
comm_blocks[ft] = new realnum_ptr[num_chunks * num_chunks];
Expand Down Expand Up @@ -198,8 +199,8 @@ fields_chunk::~fields_chunk() {
if (new_s && new_s->refcount-- <= 1) delete new_s; // delete if not shared
}

static void split_into_tiles(grid_volume gvol, std::vector<grid_volume> *result,
const size_t loop_tile_base) {
void split_into_tiles(grid_volume gvol, std::vector<grid_volume> *result,
const size_t loop_tile_base) {
if (gvol.nowned_min() < loop_tile_base) {
result->push_back(gvol);
return;
Expand All @@ -217,7 +218,7 @@ static void split_into_tiles(grid_volume gvol, std::vector<grid_volume> *result,

// First check that the tile volumes gvs do not intersect and that they add
// up to the chunk's total grid_volume gv
static void check_tiles(grid_volume gv, const std::vector<grid_volume> &gvs) {
void check_tiles(grid_volume gv, const std::vector<grid_volume> &gvs) {
grid_volume vol_intersection;
for (size_t i = 0; i < gvs.size(); i++)
for (size_t j = i + 1; j < gvs.size(); j++)
Expand All @@ -231,7 +232,7 @@ static void check_tiles(grid_volume gv, const std::vector<grid_volume> &gvs) {
}

fields_chunk::fields_chunk(structure_chunk *the_s, const char *od, double m, double beta,
bool zero_fields_near_cylorigin, int chunkidx, int loop_tile_base)
bool zero_fields_near_cylorigin, int chunkidx, int loop_tile_base_db)
: gv(the_s->gv), v(the_s->v), m(m), zero_fields_near_cylorigin(zero_fields_near_cylorigin),
beta(beta) {
s = the_s;
Expand All @@ -244,11 +245,11 @@ fields_chunk::fields_chunk(structure_chunk *the_s, const char *od, double m, dou
Courant = s->Courant;
dt = s->dt;
dft_chunks = NULL;
if (loop_tile_base > 0) {
split_into_tiles(gv, &gvs, loop_tile_base);
check_tiles(gv, gvs);
if (loop_tile_base_db > 0) {
split_into_tiles(gv, &gvs_tiled, loop_tile_base_db);
check_tiles(gv, gvs_tiled);
} else {
gvs.push_back(gv);
gvs_tiled.push_back(gv);
}
FOR_FIELD_TYPES(ft) {
polarization_state *cur = NULL;
Expand Down Expand Up @@ -305,7 +306,10 @@ fields_chunk::fields_chunk(const fields_chunk &thef, int chunkidx) : gv(thef.gv)
Courant = thef.Courant;
dt = thef.dt;
dft_chunks = NULL;
gvs = thef.gvs;
gvs_tiled = thef.gvs_tiled;
FOR_FIELD_TYPES(ft) {
gvs_eh[ft] = thef.gvs_eh[ft];
}
FOR_FIELD_TYPES(ft) {
polarization_state *cur = NULL;
for (polarization_state *ocur = thef.pol[ft]; ocur; ocur = ocur->next) {
Expand Down
12 changes: 9 additions & 3 deletions src/meep.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1467,7 +1467,7 @@ class fields_chunk {

double a, Courant, dt; // resolution a, Courant number, and timestep dt=Courant/a
grid_volume gv;
std::vector<grid_volume> gvs; // subdomains for cache-tiled execution of step_curl
std::vector<grid_volume> gvs_tiled, gvs_eh[NUM_FIELD_TYPES]; // subdomains for tiled execution
volume v;
double m; // angular dependence in cyl. coords
bool zero_fields_near_cylorigin; // fields=0 m pixels near r=0 for stability
Expand All @@ -1480,7 +1480,7 @@ class fields_chunk {
int chunk_idx;

fields_chunk(structure_chunk *, const char *outdir, double m, double beta,
bool zero_fields_near_cylorigin, int chunkidx, int loop_tile_base);
bool zero_fields_near_cylorigin, int chunkidx, int loop_tile_base_db);

fields_chunk(const fields_chunk &, int chunkidx);
~fields_chunk();
Expand Down Expand Up @@ -1711,10 +1711,11 @@ class fields {
boundary_condition boundaries[2][5];
char *outdir;
bool components_allocated;
size_t loop_tile_base_db, loop_tile_base_eh;

// fields.cpp methods:
fields(structure *, double m = 0, double beta = 0, bool zero_fields_near_cylorigin = true,
int loop_tile_base = 0);
int loop_tile_base_db = 0, int loop_tile_base_eh = 0);
fields(const fields &);
~fields();
bool equal_layout(const fields &f) const;
Expand Down Expand Up @@ -2382,6 +2383,11 @@ void set_zero_subnormals(bool iszero);
// initialize various properties of the simulation
void setup();

void split_into_tiles(grid_volume gvol, std::vector<grid_volume> *result,
const size_t loop_tile_base);

void check_tiles(grid_volume gv, const std::vector<grid_volume> &gvs);

} /* namespace meep */

#endif /* MEEP_H */
10 changes: 5 additions & 5 deletions src/step_db.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,20 +31,20 @@ using namespace std;
namespace meep {

void fields::step_db(field_type ft) {
if (ft != B_stuff && ft != D_stuff) meep::abort("step_db only works with B/D");

for (int i = 0; i < num_chunks; i++)
if (chunks[i]->is_mine())
if (chunks[i]->step_db(ft)) {
chunk_connections_valid = false;
assert(changed_materials);
chunk_connections_valid = false;
assert(changed_materials);
}
}

bool fields_chunk::step_db(field_type ft) {
bool allocated_u = false;

if (ft != B_stuff && ft != D_stuff) meep::abort("bug - step_db should only be called for B or D");

for (const auto& sub_gv : gvs) {
for (const auto& sub_gv : gvs_tiled) {
DOCMP FOR_FT_COMPONENTS(ft, cc) {
if (f[cc][cmp]) {
const component c_p = plus_component[cc], c_m = minus_component[cc];
Expand Down
111 changes: 68 additions & 43 deletions src/update_eh.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,29 @@ namespace meep {

void fields::update_eh(field_type ft, bool skip_w_components) {
if (ft != E_stuff && ft != H_stuff) meep::abort("update_eh only works with E/H");

// split the chunks' volume into subdomains for tiled execution of update_eh loop
for (int i = 0; i < num_chunks; i++)
if (chunks[i]->is_mine() && changed_materials) {
bool is_aniso = false;
FOR_FT_COMPONENTS(ft, cc) {
const direction d_c = component_direction(cc);
const direction d_1 = cycle_direction(chunks[i]->gv.dim, d_c, 1);
const direction d_2 = cycle_direction(chunks[i]->gv.dim, d_c, 2);
if (chunks[i]->s->chi1inv[cc][d_1] && chunks[i]->s->chi1inv[cc][d_2]) {
is_aniso = true;
break;
}
}
if (!chunks[i]->gvs_eh[ft].empty()) chunks[i]->gvs_eh[ft].clear();
if (loop_tile_base_eh > 0 && is_aniso) {
split_into_tiles(chunks[i]->gv, &chunks[i]->gvs_eh[ft], loop_tile_base_eh);
check_tiles(chunks[i]->gv, chunks[i]->gvs_eh[ft]);
} else {
chunks[i]->gvs_eh[ft].push_back(chunks[i]->gv);
}
}

for (int i = 0; i < num_chunks; i++)
if (chunks[i]->is_mine())
if (chunks[i]->update_eh(ft, skip_w_components)) {
Expand Down Expand Up @@ -123,53 +146,55 @@ bool fields_chunk::update_eh(field_type ft, bool skip_w_components) {
dmp[dc][cmp] = f_minus_p[dc][cmp] ? f_minus_p[dc][cmp] : f[dc][cmp];
}

DOCMP FOR_FT_COMPONENTS(ft, ec) {
if (f[ec][cmp]) {
if (type(ec) != ft) meep::abort("bug in FOR_FT_COMPONENTS");
component dc = field_type_component(ft2, ec);
const direction d_ec = component_direction(ec);
const ptrdiff_t s_ec = gv.stride(d_ec) * (ft == H_stuff ? -1 : +1);
const direction d_1 = cycle_direction(gv.dim, d_ec, 1);
const component dc_1 = direction_component(dc, d_1);
const ptrdiff_t s_1 = gv.stride(d_1) * (ft == H_stuff ? -1 : +1);
const direction d_2 = cycle_direction(gv.dim, d_ec, 2);
const component dc_2 = direction_component(dc, d_2);
const ptrdiff_t s_2 = gv.stride(d_2) * (ft == H_stuff ? -1 : +1);

direction dsigw0 = d_ec;
direction dsigw = s->sigsize[dsigw0] > 1 ? dsigw0 : NO_DIRECTION;

// lazily allocate any E/H fields that are needed (H==B initially)
if (f[ec][cmp] == f[dc][cmp] &&
(s->chi1inv[ec][d_ec] || have_f_minus_p || dsigw != NO_DIRECTION)) {
f[ec][cmp] = new realnum[gv.ntot()];
memcpy(f[ec][cmp], f[dc][cmp], gv.ntot() * sizeof(realnum));
allocated_eh = true;
}
for (size_t i = 0; i < gvs_eh[ft].size(); ++i) {
DOCMP FOR_FT_COMPONENTS(ft, ec) {
if (f[ec][cmp]) {
if (type(ec) != ft) meep::abort("bug in FOR_FT_COMPONENTS");
component dc = field_type_component(ft2, ec);
const direction d_ec = component_direction(ec);
const ptrdiff_t s_ec = gv.stride(d_ec) * (ft == H_stuff ? -1 : +1);
const direction d_1 = cycle_direction(gv.dim, d_ec, 1);
const component dc_1 = direction_component(dc, d_1);
const ptrdiff_t s_1 = gv.stride(d_1) * (ft == H_stuff ? -1 : +1);
const direction d_2 = cycle_direction(gv.dim, d_ec, 2);
const component dc_2 = direction_component(dc, d_2);
const ptrdiff_t s_2 = gv.stride(d_2) * (ft == H_stuff ? -1 : +1);

direction dsigw0 = d_ec;
direction dsigw = s->sigsize[dsigw0] > 1 ? dsigw0 : NO_DIRECTION;

// lazily allocate any E/H fields that are needed (H==B initially)
if (i == 0 && f[ec][cmp] == f[dc][cmp] &&
(s->chi1inv[ec][d_ec] || have_f_minus_p || dsigw != NO_DIRECTION)) {
f[ec][cmp] = new realnum[gv.ntot()];
memcpy(f[ec][cmp], f[dc][cmp], gv.ntot() * sizeof(realnum));
allocated_eh = true;
}

// lazily allocate W auxiliary field
if (!f_w[ec][cmp] && dsigw != NO_DIRECTION) {
f_w[ec][cmp] = new realnum[gv.ntot()];
memcpy(f_w[ec][cmp], f[ec][cmp], gv.ntot() * sizeof(realnum));
if (needs_W_notowned(ec)) allocated_eh = true; // communication needed
}
// lazily allocate W auxiliary field
if (i == 0 && !f_w[ec][cmp] && dsigw != NO_DIRECTION) {
f_w[ec][cmp] = new realnum[gv.ntot()];
memcpy(f_w[ec][cmp], f[ec][cmp], gv.ntot() * sizeof(realnum));
if (needs_W_notowned(ec)) allocated_eh = true; // communication needed
}

// for solve_cw, when W exists we get W and E from special variables
if (f_w[ec][cmp] && skip_w_components) continue;
// for solve_cw, when W exists we get W and E from special variables
if (f_w[ec][cmp] && skip_w_components) continue;

// save W field from this timestep in f_w_prev if needed by pols
if (needs_W_prev(ec)) {
if (!f_w_prev[ec][cmp]) f_w_prev[ec][cmp] = new realnum[gv.ntot()];
memcpy(f_w_prev[ec][cmp], f_w[ec][cmp] ? f_w[ec][cmp] : f[ec][cmp],
sizeof(realnum) * gv.ntot());
}
// save W field from this timestep in f_w_prev if needed by pols
if (i == 0 && needs_W_prev(ec)) {
if (!f_w_prev[ec][cmp]) f_w_prev[ec][cmp] = new realnum[gv.ntot()];
memcpy(f_w_prev[ec][cmp], f_w[ec][cmp] ? f_w[ec][cmp] : f[ec][cmp],
sizeof(realnum) * gv.ntot());
}

if (f[ec][cmp] != f[dc][cmp])
STEP_UPDATE_EDHB(f[ec][cmp], ec, gv, gv.little_owned_corner(ec), gv.big_corner(),
dmp[dc][cmp], dmp[dc_1][cmp], dmp[dc_2][cmp],
s->chi1inv[ec][d_ec], dmp[dc_1][cmp] ? s->chi1inv[ec][d_1] : NULL,
dmp[dc_2][cmp] ? s->chi1inv[ec][d_2] : NULL, s_ec, s_1, s_2, s->chi2[ec],
s->chi3[ec], f_w[ec][cmp], dsigw, s->sig[dsigw], s->kap[dsigw]);
if (f[ec][cmp] != f[dc][cmp])
STEP_UPDATE_EDHB(f[ec][cmp], ec, gv, gvs_eh[ft][i].little_owned_corner(ec), gvs_eh[ft][i].big_corner(),
dmp[dc][cmp], dmp[dc_1][cmp], dmp[dc_2][cmp],
s->chi1inv[ec][d_ec], dmp[dc_1][cmp] ? s->chi1inv[ec][d_1] : NULL,
dmp[dc_2][cmp] ? s->chi1inv[ec][d_2] : NULL, s_ec, s_1, s_2, s->chi2[ec],
s->chi3[ec], f_w[ec][cmp], dsigw, s->sig[dsigw], s->kap[dsigw]);
}
}
}

Expand Down

0 comments on commit bbedb30

Please sign in to comment.