Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve m-number adjoint bugs introduced by 1855 #2194

Merged
merged 5 commits into from
Sep 22, 2022

Conversation

smartalecH
Copy link
Collaborator

Closes #2193

#1855 changed the way the adjoint simulation was assembled. Before that PR, an entirely new simulation was initialized for both the forward and adjoint runs. After the PR, a new simulation is initialized only for the forward run. The adjoint run simply updates all relevant quantities.

Unfortunately, not all quantities (e.g. m number) are properly updated for the adjoint simulation. This PR attempts to fix that (along with a few other cleanup steps in the recombination step).

(cc @mochen4).

@smartalecH smartalecH marked this pull request as draft August 12, 2022 14:59
python/simulation.py Outdated Show resolved Hide resolved
@mochen4
Copy link
Collaborator

mochen4 commented Aug 12, 2022

Also remember to change the scale to p.r()

cyl_scale = (gv.dim == meep::Dcyl) ? 2 * p.r()

as in
cyl_scale = (gv.dim == meep::Dcyl) ? eps1.r() : 1;

@smartalecH
Copy link
Collaborator Author

smartalecH commented Aug 12, 2022

Also remember to change the scale to p.r()

Hmm I'm not sure we need that. My guess is that 2 got placed somewhere else in the earlier version of the code.... we've got 2's everywhere and it's hard to keep track sometimes...

@smartalecH smartalecH marked this pull request as ready for review August 12, 2022 18:50
@smartalecH smartalecH changed the title Resolve some adjoint bugs introduced by 1855 Resolve m-number adjoint bugs introduced by 1855 Aug 12, 2022
@smartalecH
Copy link
Collaborator Author

@mochen4 at one point you mentioned the current test_adjoint_cylindrical.py doesn't really test the cylindrical features as rigorously as it should (which is why #1855 passed). Do we want to update that test in this PR?

@codecov-commenter
Copy link

codecov-commenter commented Aug 12, 2022

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.21%. Comparing base (2aa9164) to head (ece6329).
Report is 210 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2194      +/-   ##
==========================================
+ Coverage   73.09%   73.21%   +0.12%     
==========================================
  Files          17       17              
  Lines        4906     4914       +8     
==========================================
+ Hits         3586     3598      +12     
+ Misses       1320     1316       -4     
Files with missing lines Coverage Δ
python/adjoint/optimization_problem.py 57.06% <100.00%> (+0.22%) ⬆️
python/simulation.py 76.85% <ø> (+<0.01%) ⬆️

... and 4 files with indirect coverage changes

if needs_complex_fields and self.fields.is_real:
self.fields = None
self._is_initialized = False
self.init_sim()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to just add a boolean argument to fields::use_real_fields so that we can switch back to real fields without initializing. And then this could be called by fields::change_m_number for m ≠ 1

m = new_m;
for (int i = 0; i < num_chunks; i++) {
chunks[i]->change_m_number(new_m);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, this needs to check that new_m is consistent with whether we have real fields and abort if not. Better yet, have it call if (new_m != 0) use_real_fields(false); as noted above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you envision use_real_fields(false) does in the case that the current simulation does have real fields (e.g. suppose originally m=0 but we are changing it to m=1). Wouldn't we need to reinitialize everything?

Or are you suggesting we refactor the initialization of the fields to use_real_fields(), such that

  • use_real_fields(true) deletes the extra array (the current behavior of use_real_fields())
  • use_real_fields(false) reallocates the fields if needed
  • does nothing if the simulation state is consistent

As a first pass, it might be easiest to leave use_real_fields as is and simply ensure there's a proper check in place for change_m_number() (like you suggest).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My latest commit adds a check to make sure that the fields are consistent. If the user requests complex fields when the current setup is using real fields, I just abort.

This isn't a problem for the adjoint code. We have a similar check in the python routine. But rather than aborting, we just reinitialize.

@@ -4279,6 +4279,23 @@ def change_k_point(self, k):
py_v3_to_vec(self.dimensions, self.k_point, self.is_cylindrical)
)

def change_m_number(self, m):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just call it change_m?

@mochen4
Copy link
Collaborator

mochen4 commented Aug 15, 2022

@mochen4 at one point you mentioned the current test_adjoint_cylindrical.py doesn't really test the cylindrical features as rigorously as it should (which is why #1855 passed). Do we want to update that test in this PR?

We can simply pick a different coordinate point here

far_x = [mp.Vector3(5, 0, 20)]

I think it was just a coincidence that the #1855 passed the test at that specific point with that specific fd_gradient step size.

@oskooi
Copy link
Collaborator

oskooi commented Aug 15, 2022

Note that the unit test in test_adjoint_cyl.py only tests m = 0:

dimensions = mp.CYLINDRICAL
m = 0

Once #2182 is resolved, we will also want to add a separate test for m = 1.

@mochen4
Copy link
Collaborator

mochen4 commented Aug 21, 2022

I ran more tests, and the gradients still seem to be incorrect when do_averaging=True.

@smartalecH
Copy link
Collaborator Author

@mochen4 can you elaborate?

@mochen4
Copy link
Collaborator

mochen4 commented Aug 21, 2022

@mochen4 can you elaborate?

Running the test code #2193 (comment) with m=-1 and do_averaging=False:

Directional derivative -- adjoint solver: [-1.64982559e-05], FD: -1.6525787369554845e-05|| rel_error = [0.00166597] || abs_error = [2.75314362e-08]

But with do_averaging=True:

Directional derivative -- adjoint solver: [-2.12439405e-05], FD: 1.9153108225256066e-06|| rel_error = [12.09164125] || abs_error = [2.31592513e-05]

I tried different random seeds, and I also tried comparing the gradients at individual points, and concluded that the issue was with do_averaging. Similarly for m=0 using mp.Ez source and npa.abs(ff[0,0,2])**2 objective.

Moreover, I also found that, at m=0, the gradient values for force_complex_fields=False didn't agree with the values when force_complex_fields=True. This may be a separate issue, and I will run more tests to determine which PR the issue comes from.

@mochen4
Copy link
Collaborator

mochen4 commented Sep 8, 2022

Moreover, I also found that, at m=0, the gradient values for force_complex_fields=False didn't agree with the values when force_complex_fields=True. This may be a separate issue, and I will run more tests to determine which PR the issue comes from.

Actually, this mismatch between force_complex_fields=True and force_complex_fields=False is expected and correct. I tested this initially because that was another possible origin for the missing factor of 2, which I am still slightly concerned about.

On the other hand, running #2193 (comment) withm=-1, I can confirm that this PR fixes the bug from #1855. The gradients are now the same as they were before #1855. And the fd_gradient agrees with adjoint gradient when do_averaging=False.

On the other hand, when do_averaging=True, fd_gradient disagrees with adjoint gradient even before #1855. I am still locating the issue. Moreover, the gradients differed a lot between do_averaging=True and do_averaging=False, especially the fd_gradient.

@smartalecH
Copy link
Collaborator Author

Moreover, the gradients differed a lot between do_averaging=True and do_averaging=False, especially the fd_gradient.

Yes the averaging is a major issue right now. Mostly because even the FD gradients are somewhat buggy (somehow...) and that's what we are using as our ground truth...

@mochen4
Copy link
Collaborator

mochen4 commented Sep 15, 2022

Surprisingly, the issue seems to be related to the structure at x = 0.5*np.ones(Nx*Ny). For a random structure, the gradients agree for both do_averaging=True and do_averaging=False.

On the other hand, the gradients between do_averaging=True and do_averaging=False seem to be off by a factor of -2.

This is observed for both this PR and the initial PR#1780.

@smartalecH
Copy link
Collaborator Author

Surprisingly, the issue seems to be related to the structure at x = 0.5np.ones(NxNy)

Like we discussed, this has to do with numerically unstable way we currently compute the spatial gradient (similar to the discussion here).

For a random structure, the gradients agree for both do_averaging=True and do_averaging=False.

Great!

On the other hand, the gradients between do_averaging=True and do_averaging=False seem to be off by a factor of -2.

Not great...

This is observed for both this PR and the initial PR#1780.

So this is not a result of #1855, right (which is what originally caused the bug the current PR is trying to fix)?

@smartalecH
Copy link
Collaborator Author

I updated the test_adjoint_cyl.py test to include other m numbers (including non-integer values) and other far-field points.

I noticed that the tolerance is a bit high (10%) and it reflects in the tests. There's still some decent error (5-8%) that is slipping through. This is most likely due to the averaging (which is on by default).

@mochen4
Copy link
Collaborator

mochen4 commented Sep 22, 2022

So this is not a result of #1855, right (which is what originally caused the bug the current PR is trying to fix)?

Correct, I believe this PR does fix the issue of #1855. On the other hand, the mismatch between do_averaging=True and do_averaging=False dated way back, even before #1780. I also must make a correction: gradients are not differed by a factor of -2. There is actually no apparent correlation.

The mismatch only occurs for nonzero m values in cylindrical coordinates (with a random structure). (2D Cartesian or m=0 works fine.) The issue is probably more related to the subpixel averaging of materialgrid: the simulation results are simply different by a lot.

@stevengj stevengj merged commit e4563e2 into NanoComp:master Sep 22, 2022
@mochen4
Copy link
Collaborator

mochen4 commented Sep 29, 2022

As discussed, random structures may not be suitable for testing subpixel averaging gradients. So I used a smooth structure with design weights $\rho(x,y) =\frac{xy}{N_xN_y}$ , and compared their gradients, which look plausible:

With do_averaging=False
Figure_1

With do_averaging=True
Figure_2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect adjoint fields in cylindrical coordinates
5 participants