Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow initialization of prism objects with large number of vertices #1271

Open
oskooi opened this issue Jun 30, 2020 · 2 comments
Open

slow initialization of prism objects with large number of vertices #1271

oskooi opened this issue Jun 30, 2020 · 2 comments

Comments

@oskooi
Copy link
Collaborator

oskooi commented Jun 30, 2020

For a 3d test case involving multiple Prism objects containing a total of 500+ vertices with subpixel smoothing and equal-chunk splitting, the meep_geom::set_materials_from_geometry call in python/meep.i dominates the wall-clock time (> 99%) for init_sim():

meep/python/meep.i

Lines 1646 to 1648 in fb58a86

meep_geom::set_materials_from_geometry(s, gobj_list, center, use_anisotropic_averaging, tol,
maxeval, _ensure_periodicity, _default_material,
alist, extra_materials);

(Originally in #1255, we had thought that the performance bottleneck was due to meep_mpb_eps in src/mpb.cpp which turned out not to be the case as shown in #1257:comment.)

To demonstrate that set_materials_from_geometry is the bottleneck, we run a parallel simulation with 15 processors/chunks and time all_wait() calls before and after the call to set_materials_from_geometry and also have each process print out how long it spent on this function.

    double start_time = wall_time();
    all_wait();
    master_printf("all_wait before set_materials_from_geometry: %g s\n",wall_time()-start_time);
    start_time = wall_time();
    if (set_materials) {
      meep_geom::set_materials_from_geometry(s, gobj_list, center, use_anisotropic_averaging, tol,
                                             maxeval, _ensure_periodicity, _default_material,
                                             alist, extra_materials);
    }
    printf("set_materials_from_geometry:, %d, %g s\n",my_rank(),wall_time()-start_time);
    start_time = wall_time();
    all_wait();
    master_printf("all_wait after set_materials_from_geometry: %g s\n",wall_time()-start_time);

The output demonstrates that there is a large variation in the runtimes among the 15 processes (which is expected since only certain chunks intersect the Prism objects). The slowest (rank 0) and fastest (rank 11) processes have runtimes which vary by a factor of 355.

all_wait before set_materials_from_geometry: 6.98566e-05 s
set_materials_from_geometry:, 0, 0.641839 s
set_materials_from_geometry:, 14, 5.03076 s
set_materials_from_geometry:, 7, 8.4935 s
set_materials_from_geometry:, 9, 8.72409 s
set_materials_from_geometry:, 4, 23.9741 s
set_materials_from_geometry:, 12, 26.0029 s
set_materials_from_geometry:, 6, 51.8688 s
set_materials_from_geometry:, 3, 84.4188 s
set_materials_from_geometry:, 1, 84.9311 s
set_materials_from_geometry:, 10, 92.5879 s
set_materials_from_geometry:, 2, 96.871 s
set_materials_from_geometry:, 13, 102.212 s
set_materials_from_geometry:, 5, 192.209 s
set_materials_from_geometry:, 8, 200.6 s
set_materials_from_geometry:, 11, 228.226 s
all_wait after set_materials_from_geometry: 227.584 s

(The total time for init_sim() which includes an eigenmode source is 228.4403471946716 s.)

Is there anything that can be done to speed up set_materials_from_geometry for this use case?

@oskooi
Copy link
Collaborator Author

oskooi commented Jun 30, 2020

The performance bottleneck is due to subpixel smoothing which even though it is parallelized (as mentioned in Features/Subpixel Smoothing) is still performance constrained by the one chunk in the cell division which contains the most interface pixels.

To verify this feature, we time the calls to set_epsilon in structure::set_materials for each process/chunk separately:

set_epsilon(mat, use_anisotropic_averaging, tol, maxeval);

using:

  double start_time = wall_time();
  set_epsilon(mat, use_anisotropic_averaging, tol, maxeval);
  printf("set_epsilon:, %d, %g s\n",my_rank(),wall_time()-start_time);

The times for set_epsilon are nearly identical to set_materials_from_geometry:

set_materials_from_geometry:, 0, 0.175681 s
set_epsilon:, 14, 2.16009 s
set_materials_from_geometry:, 14, 2.16597 s
set_epsilon:, 9, 5.51819 s
set_materials_from_geometry:, 9, 5.52412 s
set_epsilon:, 7, 16.0591 s
set_materials_from_geometry:, 7, 16.0649 s
set_epsilon:, 4, 18.0369 s
set_materials_from_geometry:, 4, 18.0427 s
set_epsilon:, 12, 26.4867 s
set_materials_from_geometry:, 12, 26.4926 s
set_epsilon:, 2, 28.8896 s
set_materials_from_geometry:, 2, 28.8955 s
set_epsilon:, 1, 35.96 s
set_materials_from_geometry:, 1, 35.966 s
set_epsilon:, 6, 39.4261 s
set_materials_from_geometry:, 6, 39.4324 s
set_epsilon:, 13, 44.9237 s
set_materials_from_geometry:, 13, 44.9296 s
set_epsilon:, 10, 46.823 s
set_materials_from_geometry:, 10, 46.829 s
set_epsilon:, 3, 50.081 s
set_materials_from_geometry:, 3, 50.087 s
set_epsilon:, 5, 58.6668 s
set_materials_from_geometry:, 5, 58.6726 s
set_epsilon:, 8, 79.2432 s
set_materials_from_geometry:, 8, 79.2491 s
set_epsilon:, 11, 93.5621 s
set_materials_from_geometry:, 11, 93.568 s

Currently, because only the master process outputs its wall-clock time for set_epsilon, it is difficult for the user to be aware of the performance constraints/variability of subpixel smoothing for large parallel jobs:

if (verbosity > 0) master_printf("time for set_epsilon = %g s\n", wall_time() - tstart);

It might be helpful to print the max and min times for set_epsilon across all the chunks rather than just the time for the master process.

@stevengj
Copy link
Collaborator

stevengj commented Jul 1, 2020

I think the issue is that the algorithms we are currently using to test whether a point is in a prism etcetera scale linearly with the number of vertices. There might be faster algorithms, although they may require fancier data structures.

We could put an all_wait() right before that master_printf() line to be sure that the times are synchronized. Update: fixed in 82fbfb0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants