Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Speed up choose_chunkdivision with geom_boxt_tree #1030

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

WIP: Speed up choose_chunkdivision with geom_boxt_tree #1030

wants to merge 5 commits into from

Conversation

ChristopherHogan
Copy link
Contributor

No description provided.

src/meepgeom.cpp Outdated
if (overlap > 0.0) {
for (int j = 0; j < geom.num_items; ++j) {
if (t->objects[i].o == geom.items + j) {
overlaps[j] += overlap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this loop becomes a limiting factor, we could do the lookup in another way, e.g. a hash table or an array of indices sorted by pointer address.

src/meepgeom.cpp Outdated

for (int i = 0; i < geom.num_items; ++i) {
geometric_object *go = &geom.items[i];
double overlap = overlaps[i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to check if this roughly matches the overlaps computed for each object with the old code.

int temp_periodicity = ensure_periodicity;
dimensions = 3;
ensure_periodicity = 0;
geom_tree = create_geom_box_tree0(*geom_, cell_box);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a simpler alternative to creating this tree, you could also just cache geom_get_bounding_box for each object, and check whether the bounding box overlaps before calling box_overlap_with_object. This only helps the constant factor, though, unlike a tree which in principle can give better scaling.

geom_box_intersection(&intersection, &t->b, &box);

for (int i = 0; i < t->nobjects; ++i) {
if (geom_boxes_intersect(&intersection, &t->objects[i].box)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential optimization: if this intersection is the whole of t->objects[i].box, i.e. the object lies completely within the intersection, then the overlap is just the volume of the object. We don't currently have an API to get the object volume, however.

@stevengj
Copy link
Collaborator

stevengj commented Oct 2, 2019

(I suspect that the base case in libctl/utils/geom.c:divide_geom_box_tree should be larger leaves for this application.)

@ChristopherHogan
Copy link
Contributor Author

Depends on NanoComp/libctl#43

@oskooi
Copy link
Collaborator

oskooi commented Oct 9, 2019

The following test can be used for benchmarking. This PR does not seem to be providing much of a performance improvement (and in fact does slightly worse) compared to master: 193.894 s vs. 206.908 s.

master

Using MPI version 3.0, 18 processes
-----------
Initializing structure...
Splitting into 18 chunks by cost
time for choose_chunkdivision = 193.894 s
Working in 3D dimensions.
Computational cell is 7.4 x 7.4 x 3.90909 with resolution 55
     block, center = (0,0,0.55)
          size (1e+20,1e+20,2.8)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     block, center = (0,0,-1.1)
          size (1e+20,1e+20,0.5)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     block, center = (0,0,-1.6)
          size (1e+20,1e+20,0.5)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (3.0625,3.0625,3.0625)
     block, center = (0,0,-1.95)
          size (1e+20,1e+20,0.2)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.39414,-3.39414,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.4137,-3.1137,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.39905,-2.79905,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.42459,-2.52459,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.41616,-2.21616,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.44683,-1.94683,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     ...(+ 570 objects not shown)...
time for set_epsilon = 1.71491 s
time for set_conductivity = 0.232645 s
time for set_conductivity = 0.232838 s
time for set_conductivity = 0.236223 s
time for set_conductivity = 0.236205 s
time for set_conductivity = 0.236093 s
time for set_conductivity = 0.235809 s
lorentzian susceptibility: frequency=2.80116, gamma=2.72777
lorentzian susceptibility: frequency=1.45825, gamma=1.08966
lorentzian susceptibility: frequency=1.24532, gamma=0.251645
lorentzian susceptibility: frequency=0.130662, gamma=0.268583
drude susceptibility: frequency=1e-10, gamma=0.0379081
lorentzian susceptibility: frequency=5.48457, gamma=0.513775
lorentzian susceptibility: frequency=0.101049, gamma=0
lorentzian susceptibility: frequency=8.60279, gamma=0
lorentzian susceptibility: frequency=14.619, gamma=0
-----------
estimated costs per process: 5003.61, 5514.81, 5181.63, 5372.34, 5117.11, 5611.91, 5246.98, 5355.01, 5166.24, 5456.63, 5285.86, 5401.04, 5056.53, 5551.81, 5182.95, 5455.5, 5170.85, 5650.81
estimated cost mean = 5321.2, stddev = 193.249

Elapsed run time = 216.5703 s

this PR

Using MPI version 3.0, 18 processes
-----------
Initializing structure...
Splitting into 18 chunks by cost
time for choose_chunkdivision = 206.908 s
Working in 3D dimensions.
Computational cell is 7.4 x 7.4 x 3.90909 with resolution 55
     block, center = (0,0,0.55)
          size (1e+20,1e+20,2.8)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     block, center = (0,0,-1.1)
          size (1e+20,1e+20,0.5)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     block, center = (0,0,-1.6)
          size (1e+20,1e+20,0.5)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (3.0625,3.0625,3.0625)
     block, center = (0,0,-1.95)
          size (1e+20,1e+20,0.2)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.39994,-3.39994,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.40759,-3.10759,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.44913,-2.84913,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.42989,-2.52989,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.40125,-2.20125,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.40707,-1.90707,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     ...(+ 570 objects not shown)...
time for set_epsilon = 2.42739 s
time for set_conductivity = 0.316124 s
time for set_conductivity = 0.315186 s
time for set_conductivity = 0.318315 s
time for set_conductivity = 0.318088 s
time for set_conductivity = 0.318354 s
time for set_conductivity = 0.318157 s
lorentzian susceptibility: frequency=2.80116, gamma=2.72777
lorentzian susceptibility: frequency=1.45825, gamma=1.08966
lorentzian susceptibility: frequency=1.24532, gamma=0.251645
lorentzian susceptibility: frequency=0.130662, gamma=0.268583
drude susceptibility: frequency=1e-10, gamma=0.0379081
lorentzian susceptibility: frequency=5.48457, gamma=0.513775
lorentzian susceptibility: frequency=0.101049, gamma=0
lorentzian susceptibility: frequency=8.60279, gamma=0
lorentzian susceptibility: frequency=14.619, gamma=0
-----------
estimated costs per process: 58253, 70735.6, 37089.1, 42303.1, 37755, 43749.6, 35717, 42687, 22575.7, 23095.7, 25225.6, 24252.5, 34733.5, 39773.8, 23768.8, 21599.5, 20178.6, 20517.4
estimated cost mean = 34667.2, stddev = 13895.2

Elapsed run time = 230.0236 s
from time import time
import meep as mp
from meep.materials import Al, ITO, fused_quartz
import math
import numpy as np
from random import random

lambda_min = 0.4                           # minimum source wavelength                                                                                                                            
lambda_max = 0.8                           # maximum source wavelength                                                                                                                            
fmin = 1/lambda_max                        # minimum source frequency                                                                                                                             
fmax = 1/lambda_min                        # maximum source frequency                                                                                                                             
fcen = 0.5*(fmin+fmax)                     # source frequency center                                                                                                                              
df = fmax-fmin                             # source frequency width                                                                                                                               

resolution = 55
nfreq = 800
nperiods = 10 # 24                                                                                                                                                                                
tABS = 2.2
tPML = 2.1
tGLS = 0.6
tITO = 0.5
tORG = 0.4
tAl = 0.2
ps = 0.3
rs = 0.1
dfrac = 0.2

L = nperiods*ps        # length of OLED                                                                                                                                                           

# length of computational cell along Z                                                                                                                                                            
sz = tPML+tGLS+tITO+tORG+rs+tAl
# length of non-absorbing region of computational cell in X and Y                                                                                                                                 
sxy = L+2*tABS
cell_size = mp.Vector3(sxy,sxy,sz)

boundary_layers = [mp.Absorber(tABS,direction=mp.X),
                   mp.Absorber(tABS,direction=mp.Y),
                   mp.PML(tPML,direction=mp.Z,side=mp.High)]

ORG = mp.Medium(index=1.75)

geometry = [mp.Block(material=fused_quartz,
                     size=mp.Vector3(mp.inf,mp.inf,tABS+tGLS),
                     center=mp.Vector3(z=0.5*sz-0.5*(tABS+tGLS))),
            mp.Block(material=ITO,
                     size=mp.Vector3(mp.inf,mp.inf,tITO),
                     center=mp.Vector3(z=0.5*sz-tABS-tGLS-0.5*tITO)),
            mp.Block(material=ORG,
                     size=mp.Vector3(mp.inf,mp.inf,tORG+rs),
                     center=mp.Vector3(z=0.5*sz-tABS-tGLS-tITO-0.5*(tORG+rs))),
            mp.Block(material=Al,
                     size=mp.Vector3(mp.inf,mp.inf,tAl),
                     center=mp.Vector3(z=0.5*sz-tABS-tGLS-tITO-tORG-rs-0.5*tAl))]

si = -0.5*sxy+0.5*ps+0.5*(sxy-ps*math.floor(sxy/ps))
di = ps
ni = math.floor(sxy/ps)

def sph(cx,cy):
    if mp.am_master():
        dpos = dfrac*ps*random()
    else:
        dpos = None
    dpos = mp.comm.bcast(dpos, root=0)
    return mp.Sphere(material=Al,
                     radius=rs,
                     center=mp.Vector3(cx+dpos,cy+dpos,0.5*sz-tABS-tGLS-tITO-tORG-rs))

for cx in np.arange(si,si+ni*di,di):
    for cy in np.arange(si,si+ni*di,di):
        geometry.append(sph(cx,cy))

num_src = 20                 # number of point sources                                                                                                                                            
sources = [];
for n in range(num_src):
    sources.append(mp.Source(mp.GaussianSource(fcen, fwidth=df),
                             component=mp.Ez,
                             center=mp.Vector3(z=0.5*sz-tABS-tGLS-tITO-0.4*tORG-0.2*tORG*n/num_src)))

sim = mp.Simulation(resolution=resolution,
                    cell_size=cell_size,
                    boundary_layers=boundary_layers,
                    geometry=geometry,
                    sources=sources,
                    eps_averaging=False,
                    split_chunks_evenly=False)

# surround source with a six-sided box of flux planes                                                                                                                                             
srcbox_width = 0.05
srcbox_top = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(center=mp.Vector3(0,0,0.5*sz-tPML-tGLS), size=mp.Vector3(srcbox_width,srcbox_width,0), direction=mp.Z, weight=+1))
srcbox_bot = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(center=mp.Vector3(0,0,0.5*sz-tPML-tGLS-tITO-0.8*tORG), size=mp.Vector3(srcbox_width,srcbox_width,0), direction=mp.Z, weight=-1))
srcbox_xp = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(center=mp.Vector3(0.5*srcbox_width,0,0.5*sz-tPML-tGLS-0.5*(tITO+0.8*tORG)), size=mp.Vector3(0,srcbox_width,tITO+0.8*tORG), direction=mp.X\
, weight=+1))
srcbox_xm = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(center=mp.Vector3(-0.5*srcbox_width,0,0.5*sz-tPML-tGLS-0.5*(tITO+0.8*tORG)), size=mp.Vector3(0,srcbox_width,tITO+0.8*tORG), direction=mp.\
X, weight=-1))
srcbox_yp = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(center=mp.Vector3(0,0.5*srcbox_width,0.5*sz-tPML-tGLS-0.5*(tITO+0.8*tORG)), size=mp.Vector3(srcbox_width,0,tITO+0.8*tORG), direction=mp.Y\
, weight=+1))
srcbox_ym = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(center=mp.Vector3(0,-0.5*srcbox_width,0.5*sz-tPML-tGLS-0.5*(tITO+0.8*tORG)), size=mp.Vector3(srcbox_width,0,tITO+0.8*tORG), direction=mp.\
Y, weight=-1))

# padding for flux box to fully capture waveguide mode                                                                                                                                            
fluxbox_dpad = 0.05

glass_flux = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(center=mp.Vector3(0,0,0.5*sz-tPML-(tGLS-fluxbox_dpad)), size = mp.Vector3(L,L,0), direction=mp.Z, weight=+1))
wvgbox_xp = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(size=mp.Vector3(0,L,fluxbox_dpad+tITO+tORG+fluxbox_dpad), direction=mp.X, center=mp.Vector3(0.5*L,0,0.5*sz-tPML-tGLS-0.5*(tITO+tORG)), we\
ight=+1))
wvgbox_xm = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(size=mp.Vector3(0,L,fluxbox_dpad+tITO+tORG+fluxbox_dpad), direction=mp.X, center=mp.Vector3(-0.5*L,0,0.5*sz-tPML-tGLS-0.5*(tITO+tORG)), w\
eight=-1))
wvgbox_yp = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(size=mp.Vector3(L,0,fluxbox_dpad+tITO+tORG+fluxbox_dpad), direction=mp.Y, center=mp.Vector3(0,0.5*L,0.5*sz-tPML-tGLS-0.5*(tITO+tORG)), we\
ight=+1))
wvgbox_ym = sim.add_flux(fcen, df, nfreq, mp.FluxRegion(size=mp.Vector3(L,0,fluxbox_dpad+tITO+tORG+fluxbox_dpad), direction=mp.Y, center=mp.Vector3(0,-0.5*L,0.5*sz-tPML-tGLS-0.5*(tITO+tORG)), w\
eight=-1))

mp.verbosity(2)
sim.init_sim()

@oskooi
Copy link
Collaborator

oskooi commented Oct 9, 2019

The latest commit provides just a slight improvement but the time for choose_chunkdivision is still comparable to that of master:

Splitting into 18 chunks by cost
time for choose_chunkdivision = 195.99 s

Also, the latest commit seems to be ignoring the verbose parameter and printing to standard output all 500+ geometric objects.

@ChristopherHogan
Copy link
Contributor Author

This is ready, although, as @oskooi says, it doesn't seem to make things any faster. refl_angular.py will fail in this PR because of NanoComp/libctl#43.

@oskooi
Copy link
Collaborator

oskooi commented Oct 23, 2019

Unrelated to this PR, @stevengj's object_vol branch of NanoComp/libctl provides a speedup of more than 6X compared to master for the test shown above:

Using MPI version 3.0, 18 processes
-----------
Initializing structure...
Splitting into 18 chunks by cost
time for choose_chunkdivision = 31.2376 s
Working in 3D dimensions.
Computational cell is 7.4 x 7.4 x 3.90909 with resolution 55
     block, center = (0,0,0.55)
          size (1e+20,1e+20,2.8)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     block, center = (0,0,-1.1)
          size (1e+20,1e+20,0.5)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     block, center = (0,0,-1.6)
          size (1e+20,1e+20,0.5)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (3.0625,3.0625,3.0625)
     block, center = (0,0,-1.95)
          size (1e+20,1e+20,0.2)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.44194,-3.44194,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.40394,-3.10394,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.40442,-2.80442,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.41241,-2.51241,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.42602,-2.22602,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     sphere, center = (-3.41229,-1.91229,-1.85)
          radius 0.1
          dielectric constant epsilon diagonal = (1,1,1)
     ...(+ 570 objects not shown)...
time for set_epsilon = 3.52622 s
time for set_conductivity = 0.453251 s
time for set_conductivity = 0.452212 s
time for set_conductivity = 0.457252 s
time for set_conductivity = 0.45628 s
time for set_conductivity = 0.459845 s
time for set_conductivity = 0.457122 s
lorentzian susceptibility: frequency=2.80116, gamma=2.72777
lorentzian susceptibility: frequency=1.45825, gamma=1.08966
lorentzian susceptibility: frequency=1.24532, gamma=0.251645
lorentzian susceptibility: frequency=0.130662, gamma=0.268583
drude susceptibility: frequency=1e-10, gamma=0.0379081
lorentzian susceptibility: frequency=5.48457, gamma=0.513775
lorentzian susceptibility: frequency=0.101049, gamma=0
lorentzian susceptibility: frequency=8.60279, gamma=0
lorentzian susceptibility: frequency=14.619, gamma=0
-----------
estimated costs per process: 4769.7, 4780.76, 6030.21, 6150.92, 6026.44, 6101.62, 5544.01, 5788.56, 4770.31, 4788.68, 5408.71, 5504.66, 5538.32, 5781.88, 4724.99, 4829.21, 5416.83, 5467.45
estimated cost mean = 5412.4, stddev = 516.089

Elapsed run time = 62.3301 s

The time for choose_chunkdivision is 31.2376 s compared to 193.894 s for master — a speedup of 6.2X. Note that the estimated cost mean and stddev are similar: 5412.4 ± 516.089 vs. 5321.2 ± 193.249.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants