cache epsilon computations for MPB to improve MPI scaling #1257

stevengj · 2020-06-19T20:51:21Z

Work towards #1255.

(Still needs debugging, @oskooi.)

oskooi · 2020-06-23T00:29:32Z

There are currently two failing tests on Travis (special_kz.py and oblique_source.py) showing the same error message:

CHECK failure on line 104 of maxwell_eps.c: singular 3x3 matrix
CHECK failure on line 104 of maxwell_eps.c: singular 3x3 matrix

Increasing the resolution slightly (special_kz.py:eigsrc_kz from 30 to 40; oblique_source.py from 50 to 60) produces a different error which reveals that a segmentation fault is occurring:

Using MPI version 3.1, 2 processes
complex
-----------
Initializing structure...
Halving computational cell along direction y
Splitting into 2 chunks evenly
time for choose_chunkdivision = 0.00223154 s
Working in 2D dimensions.
Computational cell is 14 x 14 x 0 with resolution 40
     block, center = (0,0,0)
          size (1e+20,1,1e+20)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (12,12,12)
time for set_epsilon = 0.156297 s
-----------
Meep: using complex fields.
corrupted size vs. prev_size
*** Process received signal ***
Signal: Aborted (6)
Signal code:  (-6)
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fa2726f7890]
[ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fa272332e97]
[ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7fa272334801]
[ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x89897)[0x7fa27237d897]
[ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x9090a)[0x7fa27238490a]
[ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x95acf)[0x7fa272389acf]
[ 6] /lib/x86_64-linux-gnu/libc.so.6(realloc+0x36b)[0x7fa27238cf9b]
[ 7] /home/oskooi/install/meep6/src/.libs/libmeep.so.19(+0xca628)[0x7fa27090d628]
[ 8] /usr/local/lib/libmpb.so.1(set_maxwell_dielectric+0x984)[0x7fa270419254]
[ 9] /home/oskooi/install/meep6/src/.libs/libmeep.so.19(_ZN4meep6fields13get_eigenmodeEdNS_9directionENS_6volumeES2_iRKNS_3vecEbiddPdPPv+0x1ec5)[0x7fa27090fdc5]
[10] /home/oskooi/install/meep6/src/.libs/libmeep.so.19(_ZN4meep6fields20add_eigenmode_sourceENS_9componentERKNS_8src_timeENS_9directionERKNS_6volumeES8_iRKNS_3vecEbiddSt7complexIdEPFSD_SB_E+0x10d)[0x7fa270910aed]
[11] /home/oskooi/install/meep6/python/meep/_meep.so(+0xecb0e)[0x7fa270c7fb0e]

Running python/tests/oblique_source.py using gdb and performing a backtrace shows:

Using MPI version 3.1, 1 processes
-----------
Initializing structure...
time for choose_chunkdivision = 0.00139393 s
Working in 2D dimensions.
Computational cell is 10 x 10 x 0 with resolution 60
     block, center = (0,0,0)
          size (1e+20,1,1e+20)
          axes (1,0,0), (0,1,0), (0,0,1)
          dielectric constant epsilon diagonal = (2.25,2.25,2.25)
time for set_epsilon = 0.922465 s
-----------
corrupted size vs. prev_size

Thread 1 "python3.5" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff7805801 in __GI_abort () at abort.c:79
#2  0x00007ffff784e897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff797bb9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff785590a in malloc_printerr (str=str@entry=0x7ffff7979c9d "corrupted size vs. prev_size") at malloc.c:5350
#4  0x00007ffff785aacf in _int_realloc (av=av@entry=0x7ffff7bb0c40 <main_arena>, oldp=oldp@entry=0x156f960, oldsize=oldsize@entry=6160, nb=nb@entry=12304) at malloc.c:4564
#5  0x00007ffff785df9b in __GI___libc_realloc (oldmem=0x156f970, bytes=12288) at malloc.c:3230
#6  0x00007ffff5e4112f in meep::meep_mpb_eps (eps=0x7fffffffa460, eps_inv=0x7fffffffa4b0, r=0x7fffffffa540, eps_data_=0x7fffffffae00) at mpb.cpp:71
#7  0x00007ffff593c254 in set_maxwell_dielectric (md=0x1539620, mesh_size=<optimized out>, R=0x7fffffffaf70, G=<optimized out>, 
    epsilon=0x7ffff5e40ed0 <meep::meep_mpb_eps(symmetric_matrix*, symmetric_matrix*, mpb_real const*, void*)>, mepsilon=0x0, epsilon_data=0x7fffffffae00) at maxwell_eps.c:498
#8  0x00007ffff5e437c2 in meep::fields::get_eigenmode (this=0x1529150, frequency=1, d=meep::NO_DIRECTION, where=..., eig_vol=..., band_num=1, _kpoint=..., match_frequency=true, parity=2, resolution=120, 
    eigensolver_tol=9.9999999999999998e-13, kdom=0x0, user_mdata=0x0) at mpb.cpp:413
#9  0x00007ffff5e4575c in meep::fields::add_eigenmode_source (this=0x1529150, c0=meep::Dielectric, src=..., d=meep::NO_DIRECTION, where=..., eig_vol=..., band_num=1, kpoint=..., match_frequency=true, 
    parity=2, resolution=0, eigensolver_tol=9.9999999999999998e-13, amp=..., A=0x0) at mpb.cpp:743
#10 0x00007ffff627a180 in _wrap_fields_add_eigenmode_source__SWIG_1 (args=0x7fffc638d2a8) at meep-python.cxx:86166
#11 0x00007ffff627a6d4 in _wrap_fields_add_eigenmode_source (self=0x7ffff664a778, args=0x7fffc638d2a8) at meep-python.cxx:86252

The problem seems to be the realloc statement within the function meep_mpb_eps on src/mpb.cpp:71:

https://github.com/NanoComp/meep/pull/1257/files#diff-e4ab557c3ba9e1876312d1a70976c3c3R71

src/mpb.cpp

oskooi · 2020-06-24T00:50:38Z

After applying the bug fix to src/mpb.cpp:71 described above and verifying that all tests in the make check suite pass, a benchmarking test for this PR involving a large 3d simulation with a 2d source plane (ridge waveguide cross section) reveals that there is practically no speed up relative to master. The test is performed on a single machine (i.e., no networking/MPI) with 14 processors/chunks.

The test for master involves timing the call to set_maxwell_dielectric in src/mpb.cpp:385 via the wall_time() function:

meep/src/mpb.cpp

Line 385 in 80edb81

set_maxwell_dielectric(mdata, mesh_size, R, G, meep_mpb_eps, NULL, &eps_data);

master

set_maxwell_dielectric:, 1067.82 s

The test for this PR involves timing each of the two calls to set_maxwell_dielectric as well as sum_to_all which is called in between the two calls to set_maxwell_dielectric.

this PR

set_maxwell_dielectric1:, 0.661123 s
sum_to_all:, 1067.43 s
set_maxwell_dielectric2:, 0.000506878 s

These results demonstrate that the sum_to_all call in this PR is taking just as long as the single call to set_maxwell_dielectric in master (even though set_maxwell_dielectric has been sped up considerably).

stevengj · 2020-06-24T01:23:50Z

You might try just timing an all_wait(); call right before the sum_to_all(), to see if the wait time is just due to one process taking a long time to reach that point.

src/mpb.cpp

oskooi · 2020-06-24T19:30:40Z

Putting an all_wait(); right before sum_to_all() and timing each function call separately (with the output displayed using master_printf) reveals that it is the all_wait() that is taking up most of the time:

set_maxwell_dielectric1:, 0.664365 s
all_wait:, 1059.81 s
sum_to_all:, 0.004601 s
set_maxwell_dielectric2:, 0.000252962 s

To investigate whether one or more of the chunks is causing the delay, each of the 14 chunks (rank 0-13) outputs its wall time for all_time() separately via printf:

all_wait:, 0 (rank), 1060.05 s
all_wait:, 1 (rank), 564.322 s
all_wait:, 2 (rank), 533.729 s
all_wait:, 3 (rank), 495.724 s
all_wait:, 4 (rank), 942.212 s
all_wait:, 5 (rank), 29.6588 s
all_wait:, 6 (rank), 735.957 s
all_wait:, 7 (rank), 552.688 s
all_wait:, 8 (rank), 493.958 s
all_wait:, 9 (rank), 1026.89 s
all_wait:, 10 (rank), 0.000249147 s
all_wait:, 11 (rank), 710.75 s
all_wait:, 12 (rank), 589.994 s
all_wait:, 13 (rank), 953.669 s

These results indicate that while there is one chunk (rank 0) causing the delay there are several other chunks with comparable times (i.e., 4, 9, 13).

stevengj · 2020-06-26T20:57:42Z

Try putting an all_wait() at the beginning of get_eigenmode as well, to check whether the synchronization delay originates in this function or somewhere else.

Meanwhile, I'm going to merge this anyway, since it should scale better to do things this way, and tests pass.

) * cache epsilon computations for MPB to improve MPI scaling * whoops * tweak * fixes * add missing sizeof * tell MPB not to do its own subpixel averaging * assert.h

stevengj added 4 commits June 19, 2020 16:50

cache epsilon computations for MPB to improve MPI scaling

c6aca70

whoops

158882a

tweak

6d4d772

fixes

ddb2337

oskooi reviewed Jun 23, 2020

View reviewed changes

src/mpb.cpp Outdated Show resolved Hide resolved

add missing sizeof

85981c2

tell MPB not to do its own subpixel averaging

98ea4e4

oskooi reviewed Jun 24, 2020

View reviewed changes

src/mpb.cpp Show resolved Hide resolved

assert.h

32893d3

stevengj merged commit fb58a86 into master Jun 26, 2020

oskooi mentioned this pull request Jun 30, 2020

slow initialization of prism objects with large number of vertices #1271

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache epsilon computations for MPB to improve MPI scaling #1257

cache epsilon computations for MPB to improve MPI scaling #1257

stevengj commented Jun 19, 2020 •

edited

Loading

oskooi commented Jun 23, 2020

oskooi commented Jun 24, 2020

stevengj commented Jun 24, 2020

oskooi commented Jun 24, 2020

stevengj commented Jun 26, 2020

cache epsilon computations for MPB to improve MPI scaling #1257

cache epsilon computations for MPB to improve MPI scaling #1257

Conversation

stevengj commented Jun 19, 2020 • edited Loading

oskooi commented Jun 23, 2020

oskooi commented Jun 24, 2020

stevengj commented Jun 24, 2020

oskooi commented Jun 24, 2020

stevengj commented Jun 26, 2020

stevengj commented Jun 19, 2020 •

edited

Loading