Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openmp for near2far calculation #868

Merged
merged 6 commits into from
May 14, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,12 @@ for flg in $CXXFLAGS; do
done
AC_SUBST(ARCHFLAG)

AC_ARG_WITH(openmp, [AC_HELP_STRING([--with-openmp],[use OpenMP directives for parallelism])], enable_openmp=$enableval, with_openmp=no)
if test x"$with_openmp" = "xyes"; then
AC_DEFINE(HAVE_OPENMP,1,[Define to enable OpenMP])
AX_OPENMP([CXXFLAGS="$CXXFLAGS $OPENMP_CXXFLAGS"], [AC_MSG_ERROR([don't know how to enable OpenMP])])
fi

##############################################################################
# More checks

Expand Down
5 changes: 5 additions & 0 deletions doc/docs/Build_From_Source.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,11 @@ By default, Meep's `configure` script picks compiler flags to optimize Meep as m
By default, Meep's configure script tries to guess the gcc `-march` flag for the system you are compiling on using `-mtune` instead when `--enable-portable-binary` is specified. If it guesses wrong, or if you want to specify a different architecture, you can pass it here. If you want to omit `-march`/`-mtune` flags entirely, pass `--without-gcc-arch`.

**`--with-openmp`**
This flag enables some experimental support for [OpenMP](https://en.wikipedia.org/wiki/OpenMP) multithreading parallelism on multi-core machines (*instead* of MPI, or in addition to MPI if you have multiple processor cores per MPI process). Currently, only multi-frequency `near2far` calculations are sped up this way, but in the future we [hope to add](https://github.com/NanoComp/meep/issues/228) additional OpenMP parallelism. (When you run Meep, you can first set the `OMP_NUM_THREADS` environment variable to the number of threads you want OpenMP to use.)


### Building From Source

The following instructions are for building parallel PyMeep with all optional features from source on Ubuntu 16.04. The parallel version can still be run serially by running a script with just `python` instead of `mpirun -np 4 python`. If you really don't want to install MPI and parallel HDF5, just replace `libhdf5-openmpi-dev` with `libhdf5-dev`, and remove the `--with-mpi`, `CC=mpicc`, and `CPP=mpicxx` flags. The paths to HDF5 will also need to be adjusted to `/usr/lib/x86_64-linux-gnu/hdf5/serial` and `/usr/include/hdf5/serial`. Note that this script builds with Python 3 by default. If you want to use Python 2, just point the `PYTHON` variable to the appropriate interpreter when calling `autogen.sh` for building Meep, and use `pip` instead of `pip3`.
Expand Down
6 changes: 5 additions & 1 deletion doc/docs/Python_User_Interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -1399,6 +1399,10 @@ Like `output_farfields` but returns a dictionary of numpy arrays instead of writ

Note that far fields have the same units and scaling as the *Fourier transforms* of the fields, and hence cannot be directly compared to time-domain fields. In practice, it is easiest to use the far fields in computations where overall scaling (units) cancel out or are irrelevant, e.g. to compute the fraction of the far fields in one region vs. another region.

(Multi-frequency `get_farfields` and `output_farfields` can be accelerated by
[compiling Meep](Build_From_Source.md#meep) with `--with-openmp` and using the
`OMP_NUM_THREADS` environment variable to specify multiple threads.)

For a scattered-field computation, you often want to separate the scattered and incident fields. Just as is described in [Tutorial/Basics](Python_Tutorials/Basics.md) for flux computations, you can do this by saving the Fourier-transformed incident from a "normalization" run and then load them into another run to be subtracted. This can be done via:

**`save_near2far(filename, near2far)`**
Expand Down Expand Up @@ -1494,7 +1498,7 @@ This feature is only available if Meep is built with [libGDSII](Build_From_Sourc
Returns a list of integer-valued layer indices for the layers present in
the specified GDSII file.

```python
```python
mp.GDSII_layers('python/examples/coupler.gds')
Out[2]: [0, 1, 2, 3, 4, 5, 31, 32]
```
Expand Down
4 changes: 4 additions & 0 deletions doc/docs/Scheme_User_Interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -1167,6 +1167,10 @@ Given an HDF5 file name `fname` (does *not* include the `.h5` suffix), a `volume

Note that far fields have the same units and scaling as the *Fourier transforms* of the fields, and hence cannot be directly compared to time-domain fields. In practice, it is easiest to use the far fields in computations where overall scaling (units) cancel out or are irrelevant, e.g. to compute the fraction of the far fields in one region vs. another region.

(Multi-frequency `output-farfields` can be accelerated by
[compiling Meep](Build_From_Source.md#meep) with `--with-openmp` and using the
`OMP_NUM_THREADS` environment variable to specify multiple threads.)

For a scattered-field computation, you often want to separate the scattered and incident fields. Just as is described in [Tutorial/Basics](Scheme_Tutorials/Basics.md) for flux computations, you can do this by saving the Fourier-transformed incident from a "normalization" run and then load them into another run to be subtracted. This can be done via:

**`(save-near2far filename near2far)`**
Expand Down
123 changes: 123 additions & 0 deletions m4/ax_openmp.m4
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# ===========================================================================
# https://www.gnu.org/software/autoconf-archive/ax_openmp.html
# ===========================================================================
#
# SYNOPSIS
#
# AX_OPENMP([ACTION-IF-FOUND[, ACTION-IF-NOT-FOUND]])
#
# DESCRIPTION
#
# This macro tries to find out how to compile programs that use OpenMP a
# standard API and set of compiler directives for parallel programming
# (see http://www-unix.mcs/)
#
# On success, it sets the OPENMP_CFLAGS/OPENMP_CXXFLAGS/OPENMP_F77FLAGS
# output variable to the flag (e.g. -omp) used both to compile *and* link
# OpenMP programs in the current language.
#
# NOTE: You are assumed to not only compile your program with these flags,
# but also link it with them as well.
#
# If you want to compile everything with OpenMP, you should set:
#
# CFLAGS="$CFLAGS $OPENMP_CFLAGS"
# #OR# CXXFLAGS="$CXXFLAGS $OPENMP_CXXFLAGS"
# #OR# FFLAGS="$FFLAGS $OPENMP_FFLAGS"
#
# (depending on the selected language).
#
# The user can override the default choice by setting the corresponding
# environment variable (e.g. OPENMP_CFLAGS).
#
# ACTION-IF-FOUND is a list of shell commands to run if an OpenMP flag is
# found, and ACTION-IF-NOT-FOUND is a list of commands to run it if it is
# not found. If ACTION-IF-FOUND is not specified, the default action will
# define HAVE_OPENMP.
#
# LICENSE
#
# Copyright (c) 2008 Steven G. Johnson <stevenj@alum.mit.edu>
# Copyright (c) 2015 John W. Peterson <jwpeterson@gmail.com>
# Copyright (c) 2016 Nick R. Papior <nickpapior@gmail.com>
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at your
# option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
# Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program. If not, see <https://www.gnu.org/licenses/>.
#
# As a special exception, the respective Autoconf Macro's copyright owner
# gives unlimited permission to copy, distribute and modify the configure
# scripts that are the output of Autoconf when processing the Macro. You
# need not follow the terms of the GNU General Public License when using
# or distributing such scripts, even though portions of the text of the
# Macro appear in them. The GNU General Public License (GPL) does govern
# all other use of the material that constitutes the Autoconf Macro.
#
# This special exception to the GPL applies to versions of the Autoconf
# Macro released by the Autoconf Archive. When you make and distribute a
# modified version of the Autoconf Macro, you may extend this special
# exception to the GPL to apply to your modified version as well.

#serial 13

AC_DEFUN([AX_OPENMP], [
AC_PREREQ([2.69]) dnl for _AC_LANG_PREFIX
AC_CACHE_CHECK([for OpenMP flag of _AC_LANG compiler], ax_cv_[]_AC_LANG_ABBREV[]_openmp, [save[]_AC_LANG_PREFIX[]FLAGS=$[]_AC_LANG_PREFIX[]FLAGS
ax_cv_[]_AC_LANG_ABBREV[]_openmp=unknown
# Flags to try: -fopenmp (gcc), -mp (SGI & PGI),
# -qopenmp (icc>=15), -openmp (icc),
# -xopenmp (Sun), -omp (Tru64),
# -qsmp=omp (AIX),
# none
ax_openmp_flags="-fopenmp -openmp -qopenmp -mp -xopenmp -omp -qsmp=omp none"
if test "x$OPENMP_[]_AC_LANG_PREFIX[]FLAGS" != x; then
ax_openmp_flags="$OPENMP_[]_AC_LANG_PREFIX[]FLAGS $ax_openmp_flags"
fi
for ax_openmp_flag in $ax_openmp_flags; do
case $ax_openmp_flag in
none) []_AC_LANG_PREFIX[]FLAGS=$save[]_AC_LANG_PREFIX[] ;;
*) []_AC_LANG_PREFIX[]FLAGS="$save[]_AC_LANG_PREFIX[]FLAGS $ax_openmp_flag" ;;
esac
AC_LINK_IFELSE([AC_LANG_SOURCE([[
@%:@include <omp.h>
static void
parallel_fill(int * data, int n)
{
int i;
@%:@pragma omp parallel for
for (i = 0; i < n; ++i)
data[i] = i;
}
int
main()
{
int arr[100000];
omp_set_num_threads(2);
parallel_fill(arr, 100000);
return 0;
}
]])],[ax_cv_[]_AC_LANG_ABBREV[]_openmp=$ax_openmp_flag; break],[])
done
[]_AC_LANG_PREFIX[]FLAGS=$save[]_AC_LANG_PREFIX[]FLAGS
])
if test "x$ax_cv_[]_AC_LANG_ABBREV[]_openmp" = "xunknown"; then
m4_default([$2],:)
else
if test "x$ax_cv_[]_AC_LANG_ABBREV[]_openmp" != "xnone"; then
OPENMP_[]_AC_LANG_PREFIX[]FLAGS=$ax_cv_[]_AC_LANG_ABBREV[]_openmp
fi
m4_default([$1], [AC_DEFINE(HAVE_OPENMP,1,[Define if OpenMP is enabled])])
fi
])dnl AX_OPENMP
17 changes: 10 additions & 7 deletions src/near2far.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -250,12 +250,15 @@ void dft_near2far::farfield_lowlevel(std::complex<double> *EH, const vec &x) {
component c0 = component(f->vc); /* equivalent source component */

vec rshift(f->shift * (0.5 * f->fc->gv.inva));
size_t idx_dft = 0;
LOOP_OVER_IVECS(f->fc->gv, f->is, f->ie, idx) {
IVEC_LOOP_LOC(f->fc->gv, x0);
x0 = f->S.transform(x0, f->sn) + rshift;
for (int i = 0; i < Nfreq; ++i) {
double freq = freq_min + i * dfreq;
#ifdef HAVE_OPENMP
# pragma omp parallel for
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I am unable to compile with OpenMP via --with-openmp on my local machine using this #HAVE_OPENMP macro (i.e., the build is successful but the examples/binary_grating_n2f.py test uses just a single thread/process for the get_farfields calculation regardless of the value for OMP_NUM_THREADS). To fix this problem, I had to revert back to using just #pragma omp parallel for (i.e., removing the #ifdef and #endif lines) and CXX="g++ -fopenmp". Perhaps this is related to the openmp -related changes in configure.ac in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoop, I forgot to add OPENMP_CXXFLAGS to CXXFLAGS … try it again.

for (int i = 0; i < Nfreq; ++i) {
double freq = freq_min + i * dfreq;
size_t idx_dft = 0;
LOOP_OVER_IVECS(f->fc->gv, f->is, f->ie, idx) {
IVEC_LOOP_LOC(f->fc->gv, x0);
x0 = f->S.transform(x0, f->sn) + rshift;
vec xs(x0);
for (int i0 = -periodic_n[0]; i0 <= periodic_n[0]; ++i0) {
if (periodic_d[0] != NO_DIRECTION)
Expand All @@ -271,8 +274,8 @@ void dft_near2far::farfield_lowlevel(std::complex<double> *EH, const vec &x) {
EH[i * 6 + j] += EH6[j] * cphase;
}
}
idx_dft++;
}
idx_dft++;
}
}
}
Expand Down