Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient PML BoxArray #2631

Merged
merged 32 commits into from
Jan 11, 2022
Merged

Conversation

WeiqunZhang
Copy link
Member

If the union of the grids is a single rectangular domain, we can simplify
the process and generate more efficient PML BoxArray.

If the union of the grids is a single rectangular domain, we can simplify
the process and generate more efficient PML BoxArray.
@WeiqunZhang WeiqunZhang changed the title [WIP] More efficient PML BoxArray More efficient PML BoxArray Dec 7, 2021
@atmyers
Copy link
Member

atmyers commented Dec 8, 2021

I used the simplified MakeBoxArray function to test #2640 - it seems to work.

Copy link
Member

@EZoni EZoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR, Weiqun! I have tried it locally on one of the test cases that we ran recently and I do not see anymore the small (10x10 in that case) PML box in the middle of the PML grid. So it seems to me that this works! I just left a couple of questions and/or comments. In particular, I think it would be great if we can add a few inline comments to the code, in order to make it a little bit easier to read for everyone.

Comment on lines +495 to +501
for (int idim = 0; idim < AMREX_SPACEDIM; ++idim) {
if (do_pml_Lo[idim]){
domain0.growLo(idim, -ncell);
}
if (do_pml_Hi[idim]){
domain0.growHi(idim, -ncell);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question to make sure I understand what is being done here (it wasn't added in this PR, but still I'd like to ask). My understanding is that this is the case where we want the PML to overlap with the last ncell of the regular domain. Shouldn't we then add ncell (i.e. grow by +ncell, instead of -ncell) to the low index and subtract ncell from the high index (i.e. grow by -ncell, as already done here)? If ncell = 10, the low index would have to be 0 instead of -10 (so -10 + (+ncell) = -10 + (+10) = 0), such that the PML overlaps with the cells 0:9 of the regular domain rather than spanning the cells -10:-1 outside of the domain. While when I read domain0.growLo(idim, -ncell) it seems like we are lowering the low index even more. I guess I'm missing either the starting point or what growLo does precisely. Could you clarify this for me? Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

growLo means growing in the direction that points to the lo (i.e., -inf) direction.

Source/BoundaryConditions/PML.cpp Outdated Show resolved Hide resolved
Source/BoundaryConditions/PML.cpp Outdated Show resolved Hide resolved
Co-authored-by: Edoardo Zoni <59625522+EZoni@users.noreply.github.com>
@EZoni
Copy link
Member

EZoni commented Dec 8, 2021

@WeiqunZhang Thank you for adding some comments, they definitely help!

@EZoni
Copy link
Member

EZoni commented Dec 8, 2021

I think the CI test Python_wrappers crashes (I think this is expected, due to the different grid decomposition in the PML). However, everything looks green here, even though that test needs a benchmark reset (and possibly even a reset of some of the analysis benchmark values set directly in the PICMI script). Don't know why all checks seem green here. Any idea, @ax3l or @WeiqunZhang?

@WeiqunZhang
Copy link
Member Author

How do we reset the benchmark? I cannot find where the benchmark is stored?

@EZoni
Copy link
Member

EZoni commented Dec 9, 2021

@WeiqunZhang
Copy link
Member Author

Oh Thanks! I need to merge development into this to get the new files.

@EZoni
Copy link
Member

EZoni commented Dec 9, 2021

Thanks @WeiqunZhang. Don't know what's happening with that test but now I see this in the CI log:

working on test: Python_wrappers
   re-making clean...
   building...
   found pre-built executable for this test
   copying files to run directory...
   path to input file: Examples/Tests/PythonWrappers/PICMI_inputs_2d.py
   running the test...
   mpiexec -n 2 python PICMI_inputs_2d.py
   WARNING: unable to open the job_info file
   doing the analysis...
   WARNING: analysis failed...
./analysis_default_regression.py  Python_wrappers_plt00100
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Bx]
Benchmark: [lev=0,Bx] 1.364593614316485e+00
Plotfile : [lev=0,Bx] nan
Absolute error: nan
Relative error: nan
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,By]
Benchmark: [lev=0,By] 2.064447838259352e+00
Plotfile : [lev=0,By] nan
Absolute error: nan
Relative error: nan
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Bz]
Benchmark: [lev=0,Bz] 1.364570701326938e+00
Plotfile : [lev=0,Bz] nan
Absolute error: nan
Relative error: nan
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Ex]
Benchmark: [lev=0,Ex] 4.250673561431686e+08
Plotfile : [lev=0,Ex] nan
Absolute error: nan
Relative error: nan
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Ey]
Benchmark: [lev=0,Ey] 1.604520783637900e+08
Plotfile : [lev=0,Ey] nan
Absolute error: nan
Relative error: nan
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Ez]
Benchmark: [lev=0,Ez] 3.222261178063704e+08
Plotfile : [lev=0,Ez] nan
Absolute error: nan
Relative error: nan

   execution time: 11.9s
   creating problem test report ...
   Python_wrappers CRASHED (backtraces produced

I will try to run it locally and see if it crashes there too.

In the meantime, @ax3l do you have a guess on why CI is showing up green even if the benchmark regression analysis is failing for this Python test? We should definitely understand why this doesn't show up as red and try to fix it, otherwise it can lead to false positives.

@WeiqunZhang
Copy link
Member Author

@EZoni Thank you for looking into this. What's strange is right after the development branch was merged into this there were no NaNs. But NaNs appeared after the commit in which the only change was the json file.

@EZoni EZoni changed the title More efficient PML BoxArray [WIP] More efficient PML BoxArray Dec 9, 2021
@EZoni
Copy link
Member

EZoni commented Dec 9, 2021

@WeiqunZhang I ran the test Python_wrappers locally and it crashes because of the benchmarks, but without showing NaNs. As we were discussing offline, this means that the benchmarks still need to be updated, but the NaN issue is probably independent of this PR.

Here's the result I got locally:

working on test: Python_wrappers
   re-making clean...
   building...
   make -j8 AMREX_HOME=/tmp/ci-lL5muNNI7b/amrex/  DEBUG=FALSE USE_ACC=FALSE USE_MPI=TRUE USE_OMP=TRUE DIM=2 USE_PSATD=TRUE USE_PYTHON_MAIN=TRUE PYINSTALLOPTIONS="--user --prefix="   COMP=g++ TEST=TRUE USE_ASSERTION=TRUE WarpxBinDir= 
   copying files to run directory...
   path to input file: Examples/Tests/PythonWrappers/PICMI_inputs_2d.py
   running the test...
   mpiexec -n 2 python PICMI_inputs_2d.py
   WARNING: unable to open the job_info file
   doing the analysis...
   WARNING: analysis failed...
./analysis_default_regression.py  Python_wrappers_plt00100
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Bx]
Benchmark: [lev=0,Bx] 1.364593614316485e+00
Plotfile : [lev=0,Bx] 1.364603047676700e+00
Absolute error: 9.43e-06
Relative error: 6.91e-06
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,By]
Benchmark: [lev=0,By] 2.064447838259352e+00
Plotfile : [lev=0,By] 2.064549954898750e+00
Absolute error: 1.02e-04
Relative error: 4.95e-05
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Bz]
Benchmark: [lev=0,Bz] 1.364570701326938e+00
Plotfile : [lev=0,Bz] 1.364326972121176e+00
Absolute error: 2.44e-04
Relative error: 1.79e-04
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Ex]
Benchmark: [lev=0,Ex] 4.250673561431686e+08
Plotfile : [lev=0,Ex] 4.250329139265512e+08
Absolute error: 3.44e+04
Relative error: 8.10e-05
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Ey]
Benchmark: [lev=0,Ey] 1.604520783637900e+08
Plotfile : [lev=0,Ey] 1.605229569294634e+08
Absolute error: 7.09e+04
Relative error: 4.42e-04
ERROR: Benchmark and plotfile checksum have different value for key [lev=0,Ez]
Benchmark: [lev=0,Ez] 3.222261178063704e+08
Plotfile : [lev=0,Ez] 3.222316477244196e+08
Absolute error: 5.53e+03
Relative error: 1.72e-05

   execution time: 12.1s
   creating problem test report ...
   Python_wrappers CRASHED (backtraces produced)

@ax3l
Copy link
Member

ax3l commented Dec 10, 2021

@ax3l
Copy link
Member

ax3l commented Dec 11, 2021

restarting CI

@ax3l ax3l closed this Dec 11, 2021
@EZoni
Copy link
Member

EZoni commented Dec 22, 2021

@WeiqunZhang Hi, would it be possible to rebase on development and fix the conflict in PML.cpp before I start having a look at some of the CI tests that fail? Thank you!

WeiqunZhang and others added 14 commits December 21, 2021 16:55
- maximum relative error: 2.50e-06
- new implementation: 10 PML grids
- old implementation: 24 PML grids
- maximum relative error: 2.73e-04
- new implementation: (18,8,8) PML grids
- old implementation: (48,18,18) PML grids
- maximum relative error: 6.44e-05
- new implementation: (2,6,6) PML grids
- old implementation: (2,12,12) PML grids
- maximum relative error: 6.84e-04
- new implementation: (10,6,6) PML grids
- old implementation: (24,12,12) PML grids
- maximum relative error: 2.55e-04
- new implementation: (18,8,8) PML grids
- old implementation: (48,18,18) PML grids
- maximum relative error: 7.43e-04
- new implementation: (10,6,6) PML grids
- old implementation: (24,12,12) PML grids
- maximum relative error: 2.41e-05
- new implementation: (6,6,6) PML grids
- old implementation: (12,12,12) PML grids
- maximum relative error: 1.32e-01 (B numerical artifact)
- new implementation: (0,20,20) PML grids
- old implementation: (0,52,40) PML grids
- maximum relative error: 1.05e-01 (B numerical artifact)
- new implementation: (0,20,20) PML grids
- old implementation: (0,52,40) PML grids
- maximum relative error: 2.73e-04
- new implementation: (18,8,8) PML grids
- old implementation: (48,18,18) PML grids
- maximum relative error: 1.07e-08
- new implementation: 8 PML grids
- old implementation: 16 PML grids
- maximum relative error: 4.91e-03
- new implementation: 24 PML grids
- old implementation: 98 PML grids
@EZoni
Copy link
Member

EZoni commented Jan 7, 2022

@WeiqunZhang I have reset the benchmarks of most CI tests that were failing. Each benchmark reset has a separate commit. Each commit message states the maximum relative error observed on the checksum benchmarks as well as the number of PML grids used with the new implementation, compared to the old one. Usually, the difference in the number of PML grids is quite large: we are now using much less PML grids than before. I believe this could explain the relative errors that we observe on the checksum benchmarks (for example, with PSATD I tend to think that some of the previous setups were kind of limit cases where some PML grids were extremely small, possibly too small compared to the number of guard cells used for the spectral solver). What do you think?

Note that I have not yet reset two benchmarks:

  • Langmuir_multi_2d_MR_anisotropic: you mentioned on Slack that this failure is expected (some ASSERT gets triggered), but I do not know how we want to fix it;
  • PEC_field_mr: the number of PML grids does not differ between the old and new implementation, but the grids themselves do differ. I'm not sure I understand why, do you? Below is a printout of the BoxArray returned by the relevant PML function.

PEC_field_mr, PML grids with the old implementation:

ba = (BoxArray maxbox(0)
       m_ref->m_hash_sig(0)
       )

ba = (BoxArray maxbox(2)
       m_ref->m_hash_sig(0)
       ((0,0,38) (31,31,47) (0,0,0)) ((0,0,208) (31,31,217) (0,0,0)) )

ba = (BoxArray maxbox(2)
       m_ref->m_hash_sig(0)
       ((0,0,14) (15,15,23) (0,0,0)) ((0,0,104) (15,15,113) (0,0,0)) )

PEC_field_mr, PML grids with the new implementation:

ba = (BoxArray maxbox(0)
       m_ref->m_hash_sig(0)
       )

ba = (BoxArray maxbox(2)
       m_ref->m_hash_sig(0)
       ((0,0,38) (31,31,47) (0,0,0)) ((0,0,208) (31,31,217) (0,0,0)) )

ba = (BoxArray maxbox(2)
       m_ref->m_hash_sig(0)
       ((0,0,19) (15,15,23) (0,0,0)) ((0,0,104) (15,15,108) (0,0,0)) )

@WeiqunZhang
Copy link
Member Author

Re: PEC_field_mr, It's the ncell argument to MakeBoxArray. In the development branch, it's ncell/ref_ratio for pml inside domain, and ncell otherwise. But it's always ncell/ref_ratio in this PR. I will fix it.

Re: Langmuir_multi_2d_MR_anisotropic, we can remove the assertion for now and fix it in a follow-up by replacing int with IntVect for ncell.

@WeiqunZhang
Copy link
Member Author

@EZoni I removed the assertion so that the test with anisotropic refinement ratio can run without assertion failure. We have also discussed in the meeting that we will use ncell/ref_ratio on the coarse PML patch. So could you help me reset the benchmarks of the two cases? I will do a quick follow-up PR to fix the aniostropic refinement ratio issue after this is merged.

@EZoni
Copy link
Member

EZoni commented Jan 11, 2022

@WeiqunZhang Thank you for the update! Sounds good, I will reset the remaining benchmarks.

- maximum relative error: 1.07e-01 (B numerical artifact)
- new implementation: (0,16,16) PML grids
- old implementation: (0,40,34) PML grids
- maximum relative error: 3.98e-02
- new implementation: (0,2,2) PML grids
- old implementation: (0,2,2) PML grids
  (different number of ghost cells on coarse PML patch)
@WeiqunZhang WeiqunZhang changed the title [WIP] More efficient PML BoxArray More efficient PML BoxArray Jan 11, 2022
@WeiqunZhang WeiqunZhang merged commit daa5154 into ECP-WarpX:development Jan 11, 2022
@WeiqunZhang WeiqunZhang deleted the pml_grids branch January 11, 2022 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants