Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loop tiling for step_update_edhb #1733

Merged
merged 6 commits into from
Sep 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions doc/docs/Python_User_Interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ def __init__(self,
progress_interval=4,
subpixel_tol=0.0001,
subpixel_maxeval=100000,
loop_tile_base=0,
loop_tile_base_db=0,
loop_tile_base_eh=0,
ensure_periodicity=True,
num_chunks=0,
Courant=0.5,
Expand Down Expand Up @@ -286,11 +287,12 @@ Python. `Vector3` is a `meep` class.
the minimum refractive index (usually 1), and in practice $S$ should be slightly
smaller.

+ **`loop_tile_base` [`number`]** — To improve the memory locality of the step-curl
field updates, Meep has an experimental feature to "tile" the loop over the Yee grid
voxels. The splitting of this loop into tiles or subdomains involves a recursive-bisection
method in which the base case for the number of voxels is specified using this parameter.
The default value is 0 or no tiling; a typical nonzero value to try would be `10000`.
+ **`loop_tile_base_db`, `loop_tile_base_eh` [`number`]** — To improve the [memory locality](https://en.wikipedia.org/wiki/Locality_of_reference)
of the field updates, Meep has an experimental feature to "tile" the loops over the Yee grid
voxels. The splitting of the update loops for step-curl and update-eh into tiles or subdomains
involves a recursive-bisection method in which the base case for the number of voxels is
specified using these two parameters, respectively. The default value is 0 or no tiling;
a typical nonzero value to try would be 10000.

+ **`output_volume` [`Volume` class ]** — Specifies the default region of space
that is output by the HDF5 output functions (below); see also the `Volume` class
Expand Down
19 changes: 11 additions & 8 deletions python/simulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -953,7 +953,8 @@ def __init__(self,
progress_interval=4,
subpixel_tol=1e-4,
subpixel_maxeval=100000,
loop_tile_base=0,
loop_tile_base_db=0,
loop_tile_base_eh=0,
ensure_periodicity=True,
num_chunks=0,
Courant=0.5,
Expand Down Expand Up @@ -1129,11 +1130,12 @@ def __init__(self,
the minimum refractive index (usually 1), and in practice $S$ should be slightly
smaller.

+ **`loop_tile_base` [`number`]** — To improve the memory locality of the step-curl
field updates, Meep has an experimental feature to "tile" the loop over the Yee grid
voxels. The splitting of this loop into tiles or subdomains involves a recursive-bisection
method in which the base case for the number of voxels is specified using this parameter.
The default value is 0 or no tiling; a typical nonzero value to try would be 10000.
+ **`loop_tile_base_db`, `loop_tile_base_eh` [`number`]** — To improve the [memory locality](https://en.wikipedia.org/wiki/Locality_of_reference)
of the field updates, Meep has an experimental feature to "tile" the loops over the Yee grid
voxels. The splitting of the update loops for step-curl and update-eh into tiles or subdomains
involves a recursive-bisection method in which the base case for the number of voxels is
specified using these two parameters, respectively. The default value is 0 or no tiling;
a typical nonzero value to try would be 10000.

+ **`output_volume` [`Volume` class ]** — Specifies the default region of space
that is output by the HDF5 output functions (below); see also the `Volume` class
Expand Down Expand Up @@ -1206,7 +1208,8 @@ def __init__(self,
self.eps_averaging = eps_averaging
self.subpixel_tol = subpixel_tol
self.subpixel_maxeval = subpixel_maxeval
self.loop_tile_base = loop_tile_base
self.loop_tile_base_db = loop_tile_base_db
self.loop_tile_base_eh = loop_tile_base_eh
self.ensure_periodicity = ensure_periodicity
self.extra_materials = extra_materials
self.default_material = default_material
Expand Down Expand Up @@ -1963,7 +1966,7 @@ def init_sim(self):
self.m if self.is_cylindrical else 0,
self.k_point.z if self.special_kz and self.k_point else 0,
not self.accurate_fields_near_cylorigin,
self.loop_tile_base
self.loop_tile_base_db, self.loop_tile_base_eh
)

if self.force_all_components and self.dimensions != 1:
Expand Down
26 changes: 15 additions & 11 deletions src/fields.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ using namespace std;
namespace meep {

fields::fields(structure *s, double m, double beta, bool zero_fields_near_cylorigin,
int loop_tile_base)
int loop_tile_base_db, int loop_tile_base_eh)
: S(s->S), gv(s->gv), user_volume(s->user_volume), v(s->v), m(m), beta(beta),
loop_tile_base_db(loop_tile_base_db), loop_tile_base_eh(loop_tile_base_eh),
working_on(&times_spent) {
shared_chunks = s->shared_chunks;
components_allocated = false;
Expand Down Expand Up @@ -59,7 +60,7 @@ fields::fields(structure *s, double m, double beta, bool zero_fields_near_cylori
chunks = new fields_chunk_ptr[num_chunks];
for (int i = 0; i < num_chunks; i++)
chunks[i] = new fields_chunk(s->chunks[i], outdir, m, beta,
zero_fields_near_cylorigin, i, loop_tile_base);
zero_fields_near_cylorigin, i, loop_tile_base_db);
FOR_FIELD_TYPES(ft) {
typedef realnum *realnum_ptr;
comm_blocks[ft] = new realnum_ptr[num_chunks * num_chunks];
Expand Down Expand Up @@ -198,8 +199,8 @@ fields_chunk::~fields_chunk() {
if (new_s && new_s->refcount-- <= 1) delete new_s; // delete if not shared
}

static void split_into_tiles(grid_volume gvol, std::vector<grid_volume> *result,
const size_t loop_tile_base) {
void split_into_tiles(grid_volume gvol, std::vector<grid_volume> *result,
const size_t loop_tile_base) {
if (gvol.nowned_min() < loop_tile_base) {
result->push_back(gvol);
return;
Expand All @@ -217,7 +218,7 @@ static void split_into_tiles(grid_volume gvol, std::vector<grid_volume> *result,

// First check that the tile volumes gvs do not intersect and that they add
// up to the chunk's total grid_volume gv
static void check_tiles(grid_volume gv, const std::vector<grid_volume> &gvs) {
void check_tiles(grid_volume gv, const std::vector<grid_volume> &gvs) {
grid_volume vol_intersection;
for (size_t i = 0; i < gvs.size(); i++)
for (size_t j = i + 1; j < gvs.size(); j++)
Expand All @@ -231,7 +232,7 @@ static void check_tiles(grid_volume gv, const std::vector<grid_volume> &gvs) {
}

fields_chunk::fields_chunk(structure_chunk *the_s, const char *od, double m, double beta,
bool zero_fields_near_cylorigin, int chunkidx, int loop_tile_base)
bool zero_fields_near_cylorigin, int chunkidx, int loop_tile_base_db)
: gv(the_s->gv), v(the_s->v), m(m), zero_fields_near_cylorigin(zero_fields_near_cylorigin),
beta(beta) {
s = the_s;
Expand All @@ -244,11 +245,11 @@ fields_chunk::fields_chunk(structure_chunk *the_s, const char *od, double m, dou
Courant = s->Courant;
dt = s->dt;
dft_chunks = NULL;
if (loop_tile_base > 0) {
split_into_tiles(gv, &gvs, loop_tile_base);
check_tiles(gv, gvs);
if (loop_tile_base_db > 0) {
split_into_tiles(gv, &gvs_tiled, loop_tile_base_db);
check_tiles(gv, gvs_tiled);
} else {
gvs.push_back(gv);
gvs_tiled.push_back(gv);
}
FOR_FIELD_TYPES(ft) {
polarization_state *cur = NULL;
Expand Down Expand Up @@ -305,7 +306,10 @@ fields_chunk::fields_chunk(const fields_chunk &thef, int chunkidx) : gv(thef.gv)
Courant = thef.Courant;
dt = thef.dt;
dft_chunks = NULL;
gvs = thef.gvs;
gvs_tiled = thef.gvs_tiled;
FOR_FIELD_TYPES(ft) {
gvs_eh[ft] = thef.gvs_eh[ft];
}
FOR_FIELD_TYPES(ft) {
polarization_state *cur = NULL;
for (polarization_state *ocur = thef.pol[ft]; ocur; ocur = ocur->next) {
Expand Down
12 changes: 9 additions & 3 deletions src/meep.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1467,7 +1467,7 @@ class fields_chunk {

double a, Courant, dt; // resolution a, Courant number, and timestep dt=Courant/a
grid_volume gv;
std::vector<grid_volume> gvs; // subdomains for cache-tiled execution of step_curl
std::vector<grid_volume> gvs_tiled, gvs_eh[NUM_FIELD_TYPES]; // subdomains for tiled execution
volume v;
double m; // angular dependence in cyl. coords
bool zero_fields_near_cylorigin; // fields=0 m pixels near r=0 for stability
Expand All @@ -1480,7 +1480,7 @@ class fields_chunk {
int chunk_idx;

fields_chunk(structure_chunk *, const char *outdir, double m, double beta,
bool zero_fields_near_cylorigin, int chunkidx, int loop_tile_base);
bool zero_fields_near_cylorigin, int chunkidx, int loop_tile_base_db);

fields_chunk(const fields_chunk &, int chunkidx);
~fields_chunk();
Expand Down Expand Up @@ -1711,10 +1711,11 @@ class fields {
boundary_condition boundaries[2][5];
char *outdir;
bool components_allocated;
size_t loop_tile_base_db, loop_tile_base_eh;

// fields.cpp methods:
fields(structure *, double m = 0, double beta = 0, bool zero_fields_near_cylorigin = true,
int loop_tile_base = 0);
int loop_tile_base_db = 0, int loop_tile_base_eh = 0);
fields(const fields &);
~fields();
bool equal_layout(const fields &f) const;
Expand Down Expand Up @@ -2382,6 +2383,11 @@ void set_zero_subnormals(bool iszero);
// initialize various properties of the simulation
void setup();

void split_into_tiles(grid_volume gvol, std::vector<grid_volume> *result,
const size_t loop_tile_base);

void check_tiles(grid_volume gv, const std::vector<grid_volume> &gvs);

} /* namespace meep */

#endif /* MEEP_H */
10 changes: 5 additions & 5 deletions src/step_db.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,20 +31,20 @@ using namespace std;
namespace meep {

void fields::step_db(field_type ft) {
if (ft != B_stuff && ft != D_stuff) meep::abort("step_db only works with B/D");

for (int i = 0; i < num_chunks; i++)
if (chunks[i]->is_mine())
if (chunks[i]->step_db(ft)) {
chunk_connections_valid = false;
assert(changed_materials);
chunk_connections_valid = false;
assert(changed_materials);
}
}

bool fields_chunk::step_db(field_type ft) {
bool allocated_u = false;

if (ft != B_stuff && ft != D_stuff) meep::abort("bug - step_db should only be called for B or D");

for (const auto& sub_gv : gvs) {
for (const auto& sub_gv : gvs_tiled) {
DOCMP FOR_FT_COMPONENTS(ft, cc) {
if (f[cc][cmp]) {
const component c_p = plus_component[cc], c_m = minus_component[cc];
Expand Down
111 changes: 68 additions & 43 deletions src/update_eh.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,29 @@ namespace meep {

void fields::update_eh(field_type ft, bool skip_w_components) {
if (ft != E_stuff && ft != H_stuff) meep::abort("update_eh only works with E/H");

// split the chunks' volume into subdomains for tiled execution of update_eh loop
for (int i = 0; i < num_chunks; i++)
if (chunks[i]->is_mine() && changed_materials) {
bool is_aniso = false;
FOR_FT_COMPONENTS(ft, cc) {
const direction d_c = component_direction(cc);
const direction d_1 = cycle_direction(chunks[i]->gv.dim, d_c, 1);
const direction d_2 = cycle_direction(chunks[i]->gv.dim, d_c, 2);
if (chunks[i]->s->chi1inv[cc][d_1] && chunks[i]->s->chi1inv[cc][d_2]) {
is_aniso = true;
break;
}
}
if (!chunks[i]->gvs_eh[ft].empty()) chunks[i]->gvs_eh[ft].clear();
if (loop_tile_base_eh > 0 && is_aniso) {
split_into_tiles(chunks[i]->gv, &chunks[i]->gvs_eh[ft], loop_tile_base_eh);
check_tiles(chunks[i]->gv, chunks[i]->gvs_eh[ft]);
} else {
chunks[i]->gvs_eh[ft].push_back(chunks[i]->gv);
}
}

for (int i = 0; i < num_chunks; i++)
if (chunks[i]->is_mine())
if (chunks[i]->update_eh(ft, skip_w_components)) {
Expand Down Expand Up @@ -123,53 +146,55 @@ bool fields_chunk::update_eh(field_type ft, bool skip_w_components) {
dmp[dc][cmp] = f_minus_p[dc][cmp] ? f_minus_p[dc][cmp] : f[dc][cmp];
}

DOCMP FOR_FT_COMPONENTS(ft, ec) {
if (f[ec][cmp]) {
if (type(ec) != ft) meep::abort("bug in FOR_FT_COMPONENTS");
component dc = field_type_component(ft2, ec);
const direction d_ec = component_direction(ec);
const ptrdiff_t s_ec = gv.stride(d_ec) * (ft == H_stuff ? -1 : +1);
const direction d_1 = cycle_direction(gv.dim, d_ec, 1);
const component dc_1 = direction_component(dc, d_1);
const ptrdiff_t s_1 = gv.stride(d_1) * (ft == H_stuff ? -1 : +1);
const direction d_2 = cycle_direction(gv.dim, d_ec, 2);
const component dc_2 = direction_component(dc, d_2);
const ptrdiff_t s_2 = gv.stride(d_2) * (ft == H_stuff ? -1 : +1);

direction dsigw0 = d_ec;
direction dsigw = s->sigsize[dsigw0] > 1 ? dsigw0 : NO_DIRECTION;

// lazily allocate any E/H fields that are needed (H==B initially)
if (f[ec][cmp] == f[dc][cmp] &&
(s->chi1inv[ec][d_ec] || have_f_minus_p || dsigw != NO_DIRECTION)) {
f[ec][cmp] = new realnum[gv.ntot()];
memcpy(f[ec][cmp], f[dc][cmp], gv.ntot() * sizeof(realnum));
allocated_eh = true;
}
for (size_t i = 0; i < gvs_eh[ft].size(); ++i) {
DOCMP FOR_FT_COMPONENTS(ft, ec) {
if (f[ec][cmp]) {
if (type(ec) != ft) meep::abort("bug in FOR_FT_COMPONENTS");
component dc = field_type_component(ft2, ec);
const direction d_ec = component_direction(ec);
const ptrdiff_t s_ec = gv.stride(d_ec) * (ft == H_stuff ? -1 : +1);
const direction d_1 = cycle_direction(gv.dim, d_ec, 1);
const component dc_1 = direction_component(dc, d_1);
const ptrdiff_t s_1 = gv.stride(d_1) * (ft == H_stuff ? -1 : +1);
const direction d_2 = cycle_direction(gv.dim, d_ec, 2);
const component dc_2 = direction_component(dc, d_2);
const ptrdiff_t s_2 = gv.stride(d_2) * (ft == H_stuff ? -1 : +1);

direction dsigw0 = d_ec;
direction dsigw = s->sigsize[dsigw0] > 1 ? dsigw0 : NO_DIRECTION;

// lazily allocate any E/H fields that are needed (H==B initially)
if (i == 0 && f[ec][cmp] == f[dc][cmp] &&
(s->chi1inv[ec][d_ec] || have_f_minus_p || dsigw != NO_DIRECTION)) {
f[ec][cmp] = new realnum[gv.ntot()];
memcpy(f[ec][cmp], f[dc][cmp], gv.ntot() * sizeof(realnum));
allocated_eh = true;
}

// lazily allocate W auxiliary field
if (!f_w[ec][cmp] && dsigw != NO_DIRECTION) {
f_w[ec][cmp] = new realnum[gv.ntot()];
memcpy(f_w[ec][cmp], f[ec][cmp], gv.ntot() * sizeof(realnum));
if (needs_W_notowned(ec)) allocated_eh = true; // communication needed
}
// lazily allocate W auxiliary field
if (i == 0 && !f_w[ec][cmp] && dsigw != NO_DIRECTION) {
f_w[ec][cmp] = new realnum[gv.ntot()];
memcpy(f_w[ec][cmp], f[ec][cmp], gv.ntot() * sizeof(realnum));
if (needs_W_notowned(ec)) allocated_eh = true; // communication needed
}

// for solve_cw, when W exists we get W and E from special variables
if (f_w[ec][cmp] && skip_w_components) continue;
// for solve_cw, when W exists we get W and E from special variables
if (f_w[ec][cmp] && skip_w_components) continue;

// save W field from this timestep in f_w_prev if needed by pols
if (needs_W_prev(ec)) {
if (!f_w_prev[ec][cmp]) f_w_prev[ec][cmp] = new realnum[gv.ntot()];
memcpy(f_w_prev[ec][cmp], f_w[ec][cmp] ? f_w[ec][cmp] : f[ec][cmp],
sizeof(realnum) * gv.ntot());
}
// save W field from this timestep in f_w_prev if needed by pols
if (i == 0 && needs_W_prev(ec)) {
if (!f_w_prev[ec][cmp]) f_w_prev[ec][cmp] = new realnum[gv.ntot()];
memcpy(f_w_prev[ec][cmp], f_w[ec][cmp] ? f_w[ec][cmp] : f[ec][cmp],
sizeof(realnum) * gv.ntot());
}
oskooi marked this conversation as resolved.
Show resolved Hide resolved

if (f[ec][cmp] != f[dc][cmp])
STEP_UPDATE_EDHB(f[ec][cmp], ec, gv, gv.little_owned_corner(ec), gv.big_corner(),
dmp[dc][cmp], dmp[dc_1][cmp], dmp[dc_2][cmp],
s->chi1inv[ec][d_ec], dmp[dc_1][cmp] ? s->chi1inv[ec][d_1] : NULL,
dmp[dc_2][cmp] ? s->chi1inv[ec][d_2] : NULL, s_ec, s_1, s_2, s->chi2[ec],
s->chi3[ec], f_w[ec][cmp], dsigw, s->sig[dsigw], s->kap[dsigw]);
if (f[ec][cmp] != f[dc][cmp])
STEP_UPDATE_EDHB(f[ec][cmp], ec, gv, gvs_eh[ft][i].little_owned_corner(ec), gvs_eh[ft][i].big_corner(),
dmp[dc][cmp], dmp[dc_1][cmp], dmp[dc_2][cmp],
s->chi1inv[ec][d_ec], dmp[dc_1][cmp] ? s->chi1inv[ec][d_1] : NULL,
dmp[dc_2][cmp] ? s->chi1inv[ec][d_2] : NULL, s_ec, s_1, s_2, s->chi2[ec],
s->chi3[ec], f_w[ec][cmp], dsigw, s->sig[dsigw], s->kap[dsigw]);
}
}
}

Expand Down