Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sources seem to change before and after optimization #2379

Closed
Arjun-Khurana opened this issue Jan 21, 2023 · 18 comments · Fixed by #2394
Closed

Sources seem to change before and after optimization #2379

Arjun-Khurana opened this issue Jan 21, 2023 · 18 comments · Fixed by #2394

Comments

@Arjun-Khurana
Copy link
Contributor

Arjun-Khurana commented Jan 21, 2023

I am attempting to optimize a vertical grating coupler using a Source object defined as:

sources = [
    mp.Source(
        src,
        component=mp.Ez,
        center = src_center,
        size = src_size,
        amp_func = my_func
    )
]

where src is a standard GaussianSource and my_func returns an amplitude corresponding to a linearly polarized fiber mode profile. I use this source to inject field into a design region and optimize the field in an output waveguide using an EigenmodeCoefficient.

When I run my optimization, the first iteration (forward run, adjoint run, and gradient calculation) is fine, but when the second forward run starts, I get a segmentation fault. Looking more closely at the adjoint solver, I printed the type of the source objects before and after the optimization problem is called:

if mp.am_really_master(): print(f'Sources (before opt): {[type(src) for src in opt.sim.sources]}')
f0, dJ_du = opt(vb) # returns dJ_du [design_region x design variables x frequency]
if mp.am_really_master(): print(f'Sources (after opt): {[type(src) for src in opt.sim.sources]}')

which results in:

Sources (before opt): [<class 'meep.source.Source'>]
Sources (after opt): [<class 'meep.source.EigenModeSource'>]

on the first iteration, and on the second iteration:

Sources (before opt): [<class 'meep.source.EigenModeSource'>]
<<SEGFAULT>>

The seg fault then occurs on the opt call in the next iteration. It seems like the OptimizationProblem.adjoint_run() method does not reset the sources back to the forward_sources list after finishing: this is left to the prepare_forward_run() method on the next iteration, but it seems like when prepare_forward_run() tries to use change_sources() to change the source list from the EigenModeSource back to my Source, it fails.

Is this a bug or am I doing something incorrectly? Seems related to #2309, so I tried including force_all_components=True in the Simulation object and made sure my eig_parity=mp.EVEN_Z + mp.ODD_Z, but no luck.

@smartalecH
Copy link
Collaborator

@Arjun-Khurana I'll address the segfault later. But first, be careful about how you are defining the spatial profile of your source.

It's tempting to try to excite a source with a particular mode profile by setting its spatial distribution to the expected field profile (as you are doing). But that won't work. You see, the source objects are currents ($J$) not fields ($E$, $H$). So you won't actually launch your mode this way.

The proper way to do this is using the principle of equivalence, where you convert your expected field distributions into equivalent source currents.

See a discussion here: #2218
And some code here: #2333
And a reference here: https://arxiv.org/abs/1301.5366

@smartalecH
Copy link
Collaborator

So the segfault could indeed be a bug, but we need a MWE to better track it down.

My hunch is that the custom source object has data (maybe on different procs) that is lost after the first forward run. So when you try to reset the sources, you end up pointing to some garbage. The source itself is probably fine, but some of the underlying data structures (like your amp_array) may be gone. You can try peeking into those source parameters manually and see which one triggers a segfault (or maybe the entire source object really does disappear)? Again a MWE that fits on a single proc will really help debug here.

Ever since we revamped how we setup each run to save (significantly) on memory (#1855), we also introduced a lot of bugs that require a lot of careful thinking. This is the downside to parallel programming -- you really have to be aware of where the data resides (whereas before, all the data was everywhere).

@smartalecH
Copy link
Collaborator

(As a simple check, ditch the amp_array and see if it still segfaults)

@Arjun-Khurana
Copy link
Contributor Author

Here's a minimum reproducible example (adapted from the waveguide bend optimization adjoint tutorial), where I've replaced the EigenmodeSource with a Gaussian profile.

Where is the amp_array? It doesn't seem to be a member of the Source class.

import meep as mp
import meep.adjoint as mpa
import numpy as np
from autograd import numpy as npa
import nlopt
from matplotlib import pyplot as plt

mp.verbosity(0)
Si = mp.Medium(index=3.4)
SiO2 = mp.Medium(index=1.44)

resolution = 20
d3 = False
Sx = 6
Sy = 5
Sz = 3
si_thick = 1
cell_size = mp.Vector3(Sx, Sy, Sz if d3 else 0)

pml_layers = [mp.PML(1.0)]

fcen = 1 / 1.55
width = 0.2
fwidth = width * fcen
source_center = [-1.5, 0, 0]
source_size = mp.Vector3(0, 2, 1 if d3 else 0)
kpoint = mp.Vector3(1, 0, 0)
src = mp.GaussianSource(frequency=fcen, fwidth=fwidth)

# source = [
#     mp.EigenModeSource(
#         src,
#         eig_band=1,
#         direction=mp.NO_DIRECTION,
#         eig_kpoint=kpoint,
#         size=source_size,
#         center=source_center,
#     )
# ]

def amplitude(xyz: mp.Vector3()):
    return np.exp(-15 * xyz.y**2)

source = [
    mp.Source(
        src,
        component=mp.Ez,
        center = source_center,
        size = source_size,
        amp_func=amplitude
    )
]

design_region_resolution = 10
Nx = design_region_resolution
Ny = design_region_resolution

design_variables = mp.MaterialGrid(mp.Vector3(Nx, Ny), SiO2, Si, grid_type="U_MEAN")
design_region = mpa.DesignRegion(
    design_variables, volume=mp.Volume(center=mp.Vector3(), size=mp.Vector3(1, 1, si_thick if d3 else 0))
)


geometry = [
    mp.Block(
        center=mp.Vector3(x=-Sx / 4), material=Si, size=mp.Vector3(Sx / 2, 0.5, si_thick if d3 else 0)
    ),  # horizontal waveguide
    mp.Block(
        center=mp.Vector3(y=Sy / 4), material=Si, size=mp.Vector3(0.5, Sy / 2, si_thick if d3 else 0)
    ),  # vertical waveguide
    mp.Block(
        center=design_region.center, size=design_region.size, material=design_variables
    ),  # design region
    # mp.Block(center=design_region.center, size=design_region.size, material=design_variables,
    #         e1=mp.Vector3(x=-1).rotate(mp.Vector3(z=1), np.pi/2), e2=mp.Vector3(y=1).rotate(mp.Vector3(z=1), np.pi/2))
    #
    # The commented lines above impose symmetry by overlapping design region with the same design variable. However,
    # currently there is an issue of doing that; We give an alternative approach to impose symmetry in later tutorials.
    # See https://github.com/NanoComp/meep/issues/1984 and https://github.com/NanoComp/meep/issues/2093
]

sim = mp.Simulation(
    cell_size=cell_size,
    boundary_layers=pml_layers,
    geometry=geometry,
    sources=source,
    eps_averaging=False,
    resolution=resolution,
    force_all_components=True
)

TE0 = mpa.EigenmodeCoefficient(
    sim, mp.Volume(center=mp.Vector3(-1, 0, 0), size=mp.Vector3(y=2, z=2 if d3 else 0)), mode=1
)

TE_top = mpa.EigenmodeCoefficient(
    sim, mp.Volume(center=mp.Vector3(0, 1, 0), size=mp.Vector3(x=2, y=0, z = 2 if d3 else 0)), mode=1
)

ob_list = [TE0, TE_top]

def J(source, top):
    return npa.abs(top / source) ** 2


opt = mpa.OptimizationProblem(
    simulation=sim,
    objective_functions=J,
    objective_arguments=ob_list,
    design_regions=[design_region],
    fcen=fcen,
    df=0,
    nf=1,
)

x0 = 0.5 * np.ones((Nx * Ny,))
opt.update_design([x0])

# opt.plot2D(True)
# plt.show()

evaluation_history = []
sensitivity = [0]

def f(x, grad):
    if mp.am_really_master(): print(f'{[type(src) for src in opt.sim.sources]}')
    f0, dJ_du = opt([x])
    if mp.am_really_master(): print(f'{[type(src) for src in opt.sim.sources]}')
    f0 = f0[0]  # f0 is an array of length 1
    if grad.size > 0:
        grad[:] = np.squeeze(dJ_du)
    evaluation_history.append(np.real(f0))
    sensitivity[0] = dJ_du
    return np.real(f0)

algorithm = nlopt.LD_MMA
n = Nx * Ny
maxeval = 10

evaluation_history = []
solver = nlopt.opt(algorithm, n)
solver.set_lower_bounds(0)
solver.set_upper_bounds(1)
solver.set_max_objective(f)
solver.set_maxeval(maxeval)
x = solver.optimize(x0)

plt.figure()
plt.plot(np.array(evaluation_history) * 100, "o-")
plt.grid(True)
plt.xlabel("Iteration")
plt.ylabel("Transmission (%)")
plt.show()

@smartalecH
Copy link
Collaborator

Where is the amp_array?

It should be a member of Source. But I really meant amp_func, which is what you are using. Try removing that in your simulation first.

i.e. continue to simplify the source object until the segfault goes away.

@Arjun-Khurana
Copy link
Contributor Author

Removing the amp_func fixes the segfault.

@smartalecH smartalecH added the bug label Jan 21, 2023
@smartalecH
Copy link
Collaborator

Hmm okay next step is to figure out why.

@Arjun-Khurana
Copy link
Contributor Author

Arjun-Khurana commented Jan 21, 2023

I've made a bit of progress towards that. During the adjoint run, Simulation.change_sources() is invoked to convert the eigenmode monitors to adjoint sources.

def change_sources(self, new_sources):
        """
        Change the list of sources in `Simulation.sources` to `new_sources`, and changes
        the sources used for the current simulation. `new_sources` must be a list of
        `Source` objects.
        """
        self.sources = new_sources
        print(f'changing sources, fields are: {self.fields}')
        if self.fields:
            self.fields.remove_sources()
            self.add_sources()
def add_source(self, src):
        print(f'adding {type(src)} source')
        if self.fields is None:
            self.init_sim()

...

        if isinstance(src, EigenModeSource):
            # code to construct eigenmode source 

At this point, the Simulation.fields object is not None, and so the function Simulation.add_source() properly constructs the EigenModeSource objects.

Output from adjoint run instantiating EigenModeSources:

Starting adjoint run...
from meep (adj): [<class 'meep.source.Source'>]
changing sources, fields are: <meep.fields; proxy of <Swig Object of type 'meep::fields *' at 0x1760e7f30> >
adding <class 'meep.source.EigenModeSource'> source
adding <class 'meep.source.EigenModeSource'> source
from meep (adj): [<class 'meep.source.EigenModeSource'>, <class 'meep.source.EigenModeSource'>]

The else clause in add_source() seems like it should handle adding the custom Source object:

else:
        add_vol_src_args = [src.component, src.src.swigobj, where]
        add_vol_src = functools.partial(
            self.fields.add_volume_source, *add_vol_src_args
        )
        
        if src.amp_func_file:
            fname_dset = src.amp_func_file.rsplit(":", 1)
            if len(fname_dset) != 2:
                err_msg = (
                    "Expected a string of the form 'h5filename:dataset'. Got '{}'"
                )
                raise ValueError(err_msg.format(src.amp_func_file))
        
            fname, dset = fname_dset
            if not fname.endswith(".h5"):
                fname += ".h5"
        
            add_vol_src(fname, dset, src.amplitude * 1.0)
        elif src.amp_func:
            print('REACHED ADD AMP FUNC SRC')
            add_vol_src(src.amp_func, src.amplitude * 1.0)
        elif src.amp_data is not None:
            add_vol_src(src.amp_data, src.amplitude * 1.0)
        else:
            add_vol_src(src.amplitude * 1.0)

However, then on the next forward run, when we invoke change_sources() again, the fields are None, so we never enter the add_source method().

Starting forward run...
from meep (fwd): [<class 'meep.source.EigenModeSource'>, <class 'meep.source.EigenModeSource'>]
changing sources, fields are: None
from meep (fwd): [<class 'meep.source.Source'>]
[Arjuns-MacBook-Pro-2:79479] *** Process received signal ***
[Arjuns-MacBook-Pro-2:79479] Signal: Segmentation fault: 11 (11)

Basically I think if we figure out why the fields are None after the adjoint run, we can fix the issue either by making them not None, or if they have to be, by fixing the logic of change_sources() to properly handle this case.

@smartalecH
Copy link
Collaborator

However, then on the next forward run, when we invoke change_sources() again, the fields are None, so we never enter the add_source method().

Yes the fields should be None, as we are clearing the simulation for a new geometry to be loaded.

The add_source logic recently changed, and this is probably why we are seeing this behavior.

We just need to change it (again) to properly take into account everything.

@hammy4815
Copy link
Contributor

hammy4815 commented Jan 22, 2023

Hi @Arjun-Khurana ,

A couple of things. First, the code you are seeing for add_source() has been reformulated recently as found in this commit. But this does not fix the bug you've stumbled upon, so I would just ignore this for now.

Second, the issue appears to be related to the amp_func property of your Source object. It appears that at some point in the optimization process, the amp_func object is lost in memory. Later, when this function is attempted to be accessed, it is replaced by another function probably sitting on the stack, stop_when_dft_decayed, which is utilized during the adjoint run.

I've been able to narrow where the 'memory deletion' process of amp_func occurs. By printing self.forward_sources[0].amp_func at different locations in the code, it either prints the amplitude function you defined and passed in, or it will create a seg fault if I attempt to print after the deletion takes place. I've narrowed it down to here:

def adjoint_run(self):
# set up adjoint sources and monitors
self.prepare_adjoint_run()

...

# Update the sources
self.sim.change_sources(self.adjoint_sources[ar])

After calling change_sources as shown above, the amp_func property appears to be deleted as I described. Here is that function:

meep/python/simulation.py

Lines 4260 to 4269 in f448ebd

def change_sources(self, new_sources):
"""
Change the list of sources in `Simulation.sources` to `new_sources`, and changes
the sources used for the current simulation. `new_sources` must be a list of
`Source` objects.
"""
self.sources = new_sources
if self.fields:
self.fields.remove_sources()
self.add_sources()

If I comment out line 4266, self.sources = new_sources, then the seg fault goes away. However, this means that the sources are never changing during the optimization process. Commenting out line 4268 does not fix the seg fault, neither does pulling 4269 out of the if statement. It's not clear from this point how the amp_func is becoming corrupted, but the issue does seem to occur somewhere in this function.

If we can figure out where from here properties such as amp_func can be accessed and possibly deleted, then I think we've found the bug.

@Arjun-Khurana
Copy link
Contributor Author

the code you are seeing for add_source() has been reformulated recently as found in #2333.

I realized that I forgot to pull master shortly after posting the issue but you're right that it's still replicable in the latest version.

Moreover, I agree with you that the issue lies in the change_sources() method. It seems to me that when sim.fields.remove_sources() is called, a call is made to:

def remove_sources(self):
    return _meep.fields_remove_sources(self)

which is a member of the fields SWIG wrapper for the cpp object, with the following definition:

meep/src/fields.cpp

Lines 575 to 584 in f448ebd

void fields_chunk::remove_sources() {
FOR_FIELD_TYPES(ft) { sources[ft].clear(); }
}
void fields::remove_sources() {
delete sources;
sources = NULL;
for (int i = 0; i < num_chunks; i++)
chunks[i]->remove_sources();
}

I'm guessing this is where the memory deallocation is happening, though I'm not an expert in SWIG and so I'm not exactly sure how this is working.

@hammy4815
Copy link
Contributor

It seems to me that when sim.fields.remove_sources() is called...

You could be right that during remove_sources the broken memory deallocation occurs. However, let me reference change_sources again:

meep/python/simulation.py

Lines 4260 to 4269 in f448ebd

def change_sources(self, new_sources):
"""
Change the list of sources in `Simulation.sources` to `new_sources`, and changes
the sources used for the current simulation. `new_sources` must be a list of
`Source` objects.
"""
self.sources = new_sources
if self.fields:
self.fields.remove_sources()
self.add_sources()

If I comment out line 4268, self.fields.remove_sources(), I still create a seg fault by trying to access the amp_data property after the call to change_sources. Whereas, when I instead comment out line 4269, self.add_sources(), I am actually able to recover the amplitude function you passed into amp_func even after the call to change_sources.

This seems to imply the actual deallocation for some reason is occurring in add_sources. But I suppose it's still possible that remove_sources makes the call to deallocate, but it somehow doesn't actually occur until add_sources.

@Arjun-Khurana
Copy link
Contributor Author

I believe I've found the issue. In the definition of EigenModeSource, the add_source() method is overloaded, and contains the following logic:

meep/python/source.py

Lines 671 to 674 in f448ebd

if isinstance(self.eig_band, mp.DiffractedPlanewave):
add_eig_src(self.amp_func, diffractedplanewave)
else:
add_eig_src(self.amp_func)

When we reach this logic in the MWE, the eig_band property is an integer, so we enter the else clause, and this is where the overwriting of the original Source.amp_func is taking place. When I replace this logic with:

if isinstance(self.eig_band, mp.DiffractedPlanewave):
    add_eig_src(self.amp_func, diffractedplanewave)
else:
    add_eig_src() 

the segfault goes away and the optimization proceeds as normal. The thing is, self.amp_func is never set in the __init__ or the add_source() for an EigenModeSource (though I suppose it could be set in the super().__init__), so I'm not sure why it was included in the original logic, or if it should be included in if clause for a DiffractedPlanewave.

@smartalecH
Copy link
Collaborator

@Arjun-Khurana can you submit a quick PR so we can better see the diff?

@Arjun-Khurana
Copy link
Contributor Author

@Arjun-Khurana can you submit a quick PR so we can better see the diff?

I don't have permission to publish a branch, should I fork and publish a branch there?

@stevengj
Copy link
Collaborator

Just to be absolutely sure we aren't accidentally decref'ing the Python function object twice in SWIG, I think we should add py_amp_func = NULL; after this line:

Py_XDECREF(py_amp_func);

and py_callback = NULL; after this line:

Py_XDECREF(py_callback);

@hammy4815
Copy link
Contributor

hammy4815 commented Jan 27, 2023

Adding in the NULL assignments unfortunately doesn't fix the issue, however it's probably still good to have them in there as you mentioned.

@hammy4815
Copy link
Contributor

It turns out we missed a call to Py_XDECREF:

meep/python/meep.i

Lines 1051 to 1053 in b86b476

%typemap(freearg) std::complex<double> (*)(const meep::vec &) {
Py_XDECREF(py_amp_func);
}

Adding py_amp_func = NULL; appears to fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants