Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omp target offload #1212

Merged
merged 44 commits into from
May 5, 2023
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
98e185e
Add GPU-accelerated version of accumulate
jamesp-epcc Feb 18, 2022
fe95922
Map data independently from code offload (not working yet)
jamesp-epcc Mar 3, 2022
af168f2
Implement pixel distortion update on GPU
jamesp-epcc Mar 18, 2022
f014a0c
Prevent crash in GPU distortion update loop
jamesp-epcc Apr 1, 2022
ffe777c
Fix a bug in GPU pixel bounds computation
jamesp-epcc Apr 1, 2022
38da021
Fix null pointer error with certain compilers when some photon data n…
jamesp-epcc Jun 10, 2022
b12e813
Fix various bugs so that output from GPU matches CPU version
jamesp-epcc Jul 22, 2022
6dddbb2
Add finalizeGPU method to clean everything up and copy back result
jamesp-epcc Jul 22, 2022
94c9318
Remove some unused and commented out stuff
jamesp-epcc Jul 22, 2022
3bd5298
Tidy up function arguments and fix diffStepRandom size bug
jamesp-epcc Jul 25, 2022
7acb4c4
Remove some unneeded GPU pointers. Support single precision images on…
jamesp-epcc Aug 1, 2022
c915d0b
Remove GPU-specific bounds data and use same structures as CPU
jamesp-epcc Aug 1, 2022
5a12cbc
Remove separate GPU arrays for boundary points and distortions
jamesp-epcc Aug 1, 2022
be0a14b
Copy data back from GPU in addDelta and subtractDelta
jamesp-epcc Aug 1, 2022
3b237e0
Release GPU memory after completion
jamesp-epcc Aug 19, 2022
1d77aa9
Remove manual memory management from GPU code (still needs tested!)
jamesp-epcc Aug 22, 2022
45bb1da
Remove custom GPU data structures and use same ones as on CPU
jamesp-epcc Aug 29, 2022
863af63
Allow GPU version to build with setup.py
jamesp-epcc Sep 30, 2022
6724bef
Add GPU support to setup script
jamesp-epcc Oct 21, 2022
6bad511
Get GPU sensor model to pass tests
jamesp-epcc Dec 2, 2022
903e5b9
Make resume feature work with GPU
jamesp-epcc Dec 9, 2022
8fadc6c
Tidy up code. Remove redundant non-GPU versions of functions
jamesp-epcc Jan 20, 2023
550af7a
Work around incorrect Clang definition of __CUDA_ARCH__
jamesp-epcc Feb 6, 2023
ae8b6ef
Ensure -lgalsim is passed to linker only once and is before -lfftw3
jamesp-epcc Feb 6, 2023
2e8028e
Wrap all GPU directives in #ifdefs
jamesp-epcc Feb 20, 2023
0423cf1
Fix function ordering, whitespace and comments. Turn off debug mode
jamesp-epcc Mar 17, 2023
c31a68a
Merge random arrays into one to get around Clang runtime argument limit
jamesp-epcc Mar 24, 2023
cdaa15b
Add const getter methods to PhotonArray instead of using const_cast
jamesp-epcc Mar 24, 2023
496530e
Move pixel distortion update back out into separate method (still nee…
jamesp-epcc Mar 31, 2023
62e11ca
Remove hardcoded abs_length_table and use values passed in from Pytho…
jamesp-epcc Mar 31, 2023
1107b33
Use GPU version of updatePixelDistortions in initialisation
jamesp-epcc Mar 31, 2023
cf8ff66
Get rid of separate GPU and CPU versions of updatePixelDistortions
jamesp-epcc Mar 31, 2023
e10f9cd
Fix various issues with comments
jamesp-epcc Apr 10, 2023
64dfa4b
Use simpler method to calculate size of image data arrays
jamesp-epcc Apr 14, 2023
49e7dc9
Call calculateConversionDepth instead of inlining
jamesp-epcc Apr 14, 2023
b7d08de
Add comment explaining that GPU requires raw pointers rather than obj…
jamesp-epcc Apr 14, 2023
895253c
Call Bounds::includes instead of inlining
jamesp-epcc Apr 14, 2023
f5c7b19
Use same version of updatePixelBounds on CPU and GPU
jamesp-epcc Apr 14, 2023
e0882cf
Use unique_ptr<bool[]> to ensure array gets deleted properly
jamesp-epcc Apr 21, 2023
6290a4e
Move updatePixelBounds back to original location and fix formatting
jamesp-epcc Apr 24, 2023
0502f08
Use += operator in updatePixelBounds to make code clearer
jamesp-epcc Apr 28, 2023
4454bf0
Add back old code in insidePixel for documentation purposes and impro…
jamesp-epcc Apr 28, 2023
cd83a3c
Move one-time data initialisation to constructor
jamesp-epcc Apr 28, 2023
39f941e
Define integer min and max functions and use them for clarity
jamesp-epcc May 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ jobs:

# Extra packages needed for testing
pip install -U -r test_requirements.txt
pip install -U nose codecov coverage
pip install -U nose coverage
pip install -U matplotlib astroplan

- name: Install CPython-only dependencies
Expand Down
4 changes: 4 additions & 0 deletions include/galsim/Laguerre.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@
#if defined(__GNUC__) && __GNUC__ >= 6
#pragma GCC diagnostic ignored "-Wint-in-bool-context"
#endif

// Clang incorrectly defines __CUDA_ARCH__ in host code when building for
// OpenMP target offload, so we have to undefine it or Eigen gets confused
#undef __CUDA_ARCH__
#include "Eigen/Dense"
using Eigen::VectorXd;
using Eigen::MatrixXd;
Expand Down
6 changes: 6 additions & 0 deletions include/galsim/PhotonArray.h
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,12 @@ namespace galsim {
double* getDXDZArray() { return _dxdz; }
double* getDYDZArray() { return _dydz; }
double* getWavelengthArray() { return _wave; }
const double* getXArray() const { return _x; }
const double* getYArray() const { return _y; }
const double* getFluxArray() const { return _flux; }
const double* getDXDZArray() const { return _dxdz; }
const double* getDYDZArray() const { return _dydz; }
const double* getWavelengthArray() const { return _wave; }
bool hasAllocatedAngles() const { return _dxdz != 0 && _dydz != 0; }
bool hasAllocatedWavelengths() const { return _wave != 0; }
/**
Expand Down
48 changes: 39 additions & 9 deletions include/galsim/Silicon.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,24 +34,35 @@

namespace galsim
{

class PUBLIC_API Silicon
{
public:
Silicon(int numVertices, double numElec, int nx, int ny, int qDist,
double diffStep, double pixelSize, double sensorThickness, double* vertex_data,
const Table& tr_radial_table, Position<double> treeRingCenter,
const Table& abs_length_table, bool transpose);

template <typename T>
bool insidePixel(int ix, int iy, double x, double y, double zconv,
ImageView<T> target, bool* off_edge=0) const;

void scaleBoundsToPoly(int i, int j, int nx, int ny,
~Silicon();

bool insidePixel(int ix, int iy, double x, double y, double zconv,
Bounds<int>& targetBounds, bool* off_edge,
int emptypolySize,
Bounds<double>* pixelInnerBoundsData,
Bounds<double>* pixelOuterBoundsData,
Position<float>* horizontalBoundaryPointsData,
Position<float>* verticalBoundaryPointsData,
Position<double>* emptypolyData) const;

void scaleBoundsToPoly(int i, int j, int nx, int ny,
const Polygon& emptypoly, Polygon& result,
double factor) const;

double calculateConversionDepth(const PhotonArray& photons, int i, double randomNumber) const;
double calculateConversionDepth(bool photonsHasAllocatedWavelengths,
const double* photonsWavelength,
const double* abs_length_table_data,
bool photonsHasAllocatedAngles,
const double* photonsDXDZ,
const double* photonsDYDZ, int i,
double randomNumber) const;

template <typename T>
void updatePixelDistortions(ImageView<T> target);
Expand All @@ -72,6 +83,8 @@ namespace galsim
template <typename T>
void initialize(ImageView<T> target, Position<int> orig_center);

void finalize();

template <typename T>
double accumulate(const PhotonArray& photons, int i1, int i2,
BaseDeviate rng, ImageView<T> target);
Expand Down Expand Up @@ -247,7 +260,11 @@ namespace galsim

void initializeBoundaryPoints(int nx, int ny);

void updatePixelBounds(int nx, int ny, size_t k);
void updatePixelBounds(int nx, int ny, size_t k,
Bounds<double>* pixelInnerBoundsData,
Bounds<double>* pixelOuterBoundsData,
Position<float>* horizontalBoundaryPointsData,
Position<float>* verticalBoundaryPointsData);

Polygon _emptypoly;
mutable std::vector<Polygon> _testpoly;
Expand All @@ -265,6 +282,19 @@ namespace galsim
Table _abs_length_table;
bool _transpose;
ImageAlloc<double> _delta;
std::unique_ptr<bool[]> _changed;

// GPU data
std::vector<double> _abs_length_table_GPU;
std::vector<Position<double> > _emptypolyGPU;
double _abs_length_arg_min, _abs_length_arg_max;
double _abs_length_increment;
int _abs_length_size;

// need to keep a pointer to the last target image's data and its data type
// so we can release it on the GPU later
void* _targetData;
bool _targetIsDouble;
};

PUBLIC_API int SetOMPThreads(int num_threads);
Expand Down
41 changes: 34 additions & 7 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,9 @@ def all_files_from(dir, ext=''):
'-Wno-shorten-64-to-32','-fvisibility=hidden','-stdlib=libc++'],
'clang w/ manual OpenMP' : ['-O2','-std=c++11','-Xpreprocessor','-fopenmp',
'-Wno-shorten-64-to-32','-fvisibility=hidden','-stdlib=libc++'],
'clang w/ GPU' : ['-O2','-msse2','-std=c++11','-fopenmp','-fopenmp-targets=nvptx64-nvidia-cuda',
'-Wno-openmp-mapping','-Wno-unknown-cuda-version',
'-Wno-shorten-64-to-32','-fvisibility=hidden', '-DGALSIM_USE_GPU'],
'unknown' : [],
}
lopt = {
Expand All @@ -90,6 +93,8 @@ def all_files_from(dir, ext=''):
'clang w/ OpenMP' : ['-stdlib=libc++','-fopenmp'],
'clang w/ Intel OpenMP' : ['-stdlib=libc++','-liomp5'],
'clang w/ manual OpenMP' : ['-stdlib=libc++','-lomp'],
'clang w/ GPU' : ['-fopenmp','-fopenmp-targets=nvptx64-nvidia-cuda',
'-Wno-openmp-mapping','-Wno-unknown-cuda-version'],
'unknown' : [],
}

Expand Down Expand Up @@ -143,7 +148,11 @@ def get_compiler_type(compiler, check_unknown=True, output=False):
# with the openmp flag and see if it works.
if output:
print('Compiler is Clang. Checking if it is a version that supports OpenMP.')
if try_openmp(compiler, 'clang w/ OpenMP'):
if supports_gpu(compiler):
if output:
print("Yay! This version of clang supports GPU!")
return 'clang w/ GPU'
elif try_openmp(compiler, 'clang w/ OpenMP'):
if output:
print("Yay! This version of clang supports OpenMP!")
return 'clang w/ OpenMP'
Expand Down Expand Up @@ -193,6 +202,23 @@ def get_compiler_type(compiler, check_unknown=True, output=False):
else:
return 'unknown'

# Check whether this build of Clang supports offloading to GPU via OpenMP
def supports_gpu(compiler):
# Print out compiler's targets
cc = compiler.compiler_so[0]
if cc == 'ccache':
cc = compiler.compiler_so[1]
cmd = [cc,'-print-targets']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
lines = p.stdout.readlines()
# Look for 'nvptx' in the output. May need a more general check in future to support
# other GPU architectures
for line in lines:
line = line.decode().strip()
if 'nvptx' in line:
return True
return False

# Check for the fftw3 library in some likely places
def find_fftw_lib(output=False):
import distutils.sysconfig
Expand Down Expand Up @@ -435,7 +461,7 @@ def try_compile(cpp_code, compiler, cflags=[], lflags=[], prepend=None, check_wa
cpp_name = cpp_file.name

# Just get a named temporary file to write to:
with tempfile.NamedTemporaryFile(delete=False, suffix='.os', dir=local_tmp) as o_file:
with tempfile.NamedTemporaryFile(delete=False, suffix='.o', dir=local_tmp) as o_file:
o_name = o_file.name

# Another named temporary file for the executable
Expand Down Expand Up @@ -486,7 +512,8 @@ def try_compile(cpp_code, compiler, cflags=[], lflags=[], prepend=None, check_wa
print('Trying link command:')
print(' '.join(cmd))
print('Output was:')
print(' ',b' '.join(lines).decode())
#print(' ',b' '.join(lines).decode())
print(' ',b' '.join(lines))
returncode = p.returncode
except OSError as e:
if debug:
Expand Down Expand Up @@ -538,7 +565,8 @@ def try_compile(cpp_code, compiler, cflags=[], lflags=[], prepend=None, check_wa
print('Trying link command:')
print(' '.join(cmd))
print('Output was:')
print(' ',b' '.join(lines).decode())
#print(' ',b' '.join(lines).decode())
print(' ',b' '.join(lines))
returncode = p.returncode
except OSError as e:
if debug:
Expand Down Expand Up @@ -823,8 +851,6 @@ def add_dirs(builder, output=False):
if hasattr(builder, 'library_dirs'):
if fftw_libpath != '':
builder.library_dirs.append(fftw_libpath)
builder.libraries.append('galsim') # Make sure galsim comes before fftw3
builder.libraries.append(os.path.split(fftw_lib)[1].split('.')[0][3:])
fftw_include = os.path.join(os.path.split(fftw_libpath)[0], 'include')
if os.path.isfile(os.path.join(fftw_include, 'fftw3.h')):
print('Include directory for fftw3 is ',fftw_include)
Expand Down Expand Up @@ -1276,7 +1302,8 @@ def run_tests(self):
ext=Extension("galsim._galsim",
py_sources,
depends = cpp_sources + headers + inst,
undef_macros = undef_macros)
undef_macros = undef_macros,
extra_link_args = ["-lfftw3"])

build_dep = ['setuptools>=38', 'pybind11>=2.2', 'numpy>=1.17']
run_dep = ['astropy', 'LSSTDESC.Coord']
Expand Down
4 changes: 4 additions & 0 deletions src/RealGalaxy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@
#if defined(__GNUC__) && __GNUC__ >= 6
#pragma GCC diagnostic ignored "-Wint-in-bool-context"
#endif

// Clang incorrectly defines __CUDA_ARCH__ in host code when building for
// OpenMP target offload, so we have to undefine it or Eigen gets confused
#undef __CUDA_ARCH__
#include "Eigen/Dense"

#include "RealGalaxy.h"
Expand Down
Loading