Omp target offload #1212

jamesp-epcc · 2023-02-13T15:08:15Z

This is my GPU acceleration work on the sensor model, rebased against the current main branch. Obviously this is a very extensive change, so I'm open to discussing how it's implemented and making changes to better fit the existing code. There is a single unified code base for both GPU and CPU implementations; when offloading is disabled (which can be done via an environment variable or by building with a non-GPU-aware compiler), the offloaded regions act like OpenMP parallel regions, so the loops still take advantage of multiple CPU cores as before. A version of Clang with offloading support is required to build this for GPU.

jamesp-epcc · 2023-02-16T09:34:31Z

I see the CI is failing. Apparently gcc doesn't like me mapping this via OpenMP. Unfortunately this is needed to build a working GPU version with Clang. I had thought it should be harmless to have the offload directives there even for CPU builds, but it might be necessary to wrap them in #ifdefs so that the compiler only sees them when actually building for GPU.

jamesp-epcc · 2023-02-20T14:51:32Z

I've wrapped the GPU directives in #ifdefs and the checks now pass. The original code already had similar checks in place for the OpenMP parallel directives so it's not too big a departure from how we previously did things. I still need to investigate whether actually building and running the GPU version in CI is feasible.

rmjarvis

This PR is pretty hard to review right now. The src/Silicon.cpp file in particular.

You moved some of the functions around, which means they don't show up as a useful diff currently. There is a full function removed, and then a very similar function added in a different location. You also removed some inline comments, which makes git think blocks of code that are otherwise identical are actually different. There are also some new functions, which are just old functions with some slight changes to data structures, not actually (I think!) any new code.

Please reorganize this so that the Silicon.cpp file show on github as a reasonable diff, so I can review what the real changes are. Thanks!

rmjarvis · 2023-03-13T21:38:39Z

setup.py

@@ -47,7 +47,7 @@
    raise

 # Turn this on for more verbose debugging output about compile attempts.
-debug = False
+debug = True


This needs to be turned back off.

rmjarvis · 2023-03-13T21:54:50Z

src/Silicon.cpp

    }
+
+    bool Silicon::insidePixel(int ix, int iy, double x, double y, double zconv,


Can you please move this and searchNeighbors back up to where they used to be, so git will show me what the differences actually are? Putting them down here means I have to manually compare the two versions to check that nothing substantive changed. Likewise make sure addDelta and subtractDelta show up here as regular diffs, since they're also in different places now.

rmjarvis · 2023-03-13T21:56:45Z

src/Silicon.cpp

    }

    template <typename T>
    double Silicon::accumulate(const PhotonArray& photons, int i1, int i2,
-                               BaseDeviate rng, ImageView<T> target)
+			       BaseDeviate rng, ImageView<T> target)


This looks like you maybe have tabs in the new version? There shouldn't be any whitespace change here.

rmjarvis · 2023-03-13T21:58:42Z

src/Silicon.cpp

+	    185528756957.328400, 182815356489.945160, 263157894736.842072,
+	    398406374501.992065, 558659217877.094971, 469483568075.117371,
+	    833333333333.333374, 917431192660.550415, 1058201058201.058228
+	};


I don't think this is right. We pass the abs_length table from python. This shouldn't be hard-coded into the C++ layer.

rmjarvis · 2023-03-13T22:06:59Z

src/Silicon.cpp

+                if (tableIdx < 0) tableIdx = 0;
+		int tableIdx1 = tableIdx + 1;
+                if (tableIdx > 239) tableIdx = 239;
+                if (tableIdx1 > 239) tableIdx1 = 239;


Too many special numbers here. 255.0, 5.0, 239. These all need to be computed from the input abs_length_table you get from python. Don't assume these are necessarily the values forever.

rmjarvis · 2023-03-13T22:14:13Z

src/Silicon.cpp

+	double* photonsDXDZ = photonsMutable.getDXDZArray();
+	double* photonsDYDZ = photonsMutable.getDYDZArray();
+	double* photonsFlux = photonsMutable.getFluxArray();
+	double* photonsWavelength = photonsMutable.getWavelengthArray();


Do you really need pointers to mutable values? If not, you should just add const versions of these in the PhotonArray class that returns const double* pointers rather than using const_cast.

rmjarvis · 2023-03-13T22:18:45Z

src/Silicon.cpp

-                y0 += dydz * dz_pixel; // dy in pixels
-            }
-            xdbg<<" => "<<x0<<','<<y0;
-            // This is the reverse of depth. zconv is how far above the substrate the e- converts.


You got rid of most of these inline comments in your refactoring. Please add them back.

rmjarvis · 2023-03-13T22:31:30Z

include/galsim/Silicon.h

+                                  Bounds<double>* pixelInnerBoundsData,
+                                  Bounds<double>* pixelOuterBoundsData,
+                                  Position<float>* horizontalBoundaryPointsData,
+                                  Position<float>* verticalBoundaryPointsData);


Let's not have two ways to do this please. Rewrite the existing updatePixelBounds to work with the GPU. Don't repeat everything in a second function.

This is not fixed yet. You still have both of these functions.

src/Silicon.cpp

rmjarvis · 2023-03-13T22:35:59Z

src/Silicon.cpp

+        for (int i = 0; i < imageDataSize; i++) {
+            targetData[i] += deltaData[i];
+            deltaData[i] = 0.0;
+	}


I assume this should be a call to addDelta, right? Looks like an oversight in the previous code, but it's more important now that addDelta isn't just a one-liner.

…ot allocated

… GPU

…d to merge CPU and GPU versions)

…n layer

jamesp-epcc · 2023-03-31T14:56:25Z

I have addressed Mike's feedback and rebased against the current main branch.

The function re-ordering, the whitespace changes and the loss of comments were an artefact of me adding GPU versions of the functions initially alongside the originals, then later renaming them and removing the originals, but hopefully all those issues are resolved now.
The values from _abs_length_table are now taken from the table passed in from the Python layer rather than being hardcoded (this was meant to be a temporary change but I forgot about it, apologies).
Const getter methods have been added to PhotonArray instead of const_casting it.
The separate CPU version of updatePixelDistortions has now been removed. By making a few changes I was able to use the GPU-enabled version for all purposes instead.
The pixel distortion update has been moved back out of the update method, as it was originally. addDelta is not being used in update, but I have added a comment explaining why.

src/Silicon.cpp

rmjarvis · 2023-04-05T00:53:45Z

include/galsim/Silicon.h

@@ -265,6 +278,19 @@ namespace galsim
        Table _abs_length_table;
        bool _transpose;
        ImageAlloc<double> _delta;
+        std::shared_ptr<bool> _changed;


I'm pretty sure this is wrong. This used to be a local vector. And _changed.get() is still being used as a bare C array. This might explain the crashes you were seeing. So after fixing this, probably could try to put back the min/max usage, which IMO is more readable than the two step you switched to.

I moved _changed to be a member variable rather than local to avoid the overhead of allocating it on the GPU every time update is called. The reason it's now a shared_ptr to a bare array instead of a vector is that bool vectors have an optimised implementation that packs multiple bools into each byte, so you can't get a pointer to the raw data, which the GPU requires.

I agree that std::min and std::max would be more readable, but surprisingly the GPU doesn't support them (I was using them initially but they caused weird bugs that took a long time to track down). We could implement custom, GPU-friendly min and max functions rather than writing out the algorithm though.

rmjarvis · 2023-04-05T00:55:26Z

src/Silicon.cpp

-                if (y < ny) changed[(x * ny) + y] = true; // pixel above
-                if (y > 0)  changed[(x * ny) + (y - 1)] = true; // pixel below
+                if (y < ny) changedData[(x * ny) + y] = true; // pixel above
+                if (y > 0)  changedData[(x * ny) + (y - 1)] = true; // pixel below


Here is where you use changeData, but this is a pointer that you got from _changed.get(), and _changed is just a single shared_ptr, not a vector. So this looks like undefined behavior here.

_changed.get() will return the raw bool pointer from inside the shared pointer, so this works as intended. See above for explanation of why this is no longer a vector.

rmjarvis · 2023-04-05T00:57:28Z

include/galsim/Silicon.h

+                                  Bounds<double>* pixelInnerBoundsData,
+                                  Bounds<double>* pixelOuterBoundsData,
+                                  Position<float>* horizontalBoundaryPointsData,
+                                  Position<float>* verticalBoundaryPointsData);


This is not fixed yet. You still have both of these functions.

src/Silicon.cpp

…ects

jamesp-epcc · 2023-04-17T13:56:45Z

The CI tests are failing as they are unable to install codecov. However I think this is unrelated to my changes (see here: home-assistant/core#91283 ).

rmjarvis · 2023-04-17T15:51:13Z

Just remove codecov from the ci script.

We actually haven't been using it for a while, since they now recommend a bash uploader. And they recently removed codecov from pypi, so pip can't find it anymore. Probably to help discourage people from using the obsolete uploader.

src/Silicon.cpp

Do not install codecov in CI script

src/Silicon.cpp

rmjarvis

OK, I think I'm mostly happy with this now. Just a couple minor comments.

But I do still want Josh to take a look at some of the OpenMP GPU directives to see if he has any comments there.

src/Silicon.cpp

rmjarvis · 2023-04-27T16:55:13Z

src/Silicon.cpp

+	    _emptypolyGPU[i].x = _emptypoly[i].x;
+	    _emptypolyGPU[i].y = _emptypoly[i].y;
+	}
+        Position<double>* emptypolyData = _emptypolyGPU.data();


I think my above comment still applies. I also suspect most of this (after setting targetData and deltaData) belongs in the constructor. Not in initialize, since I think they are not going to change for each target image.

…ve comments Remove redundant _testpoly vector

jmeyers314

Looks good to me. I left a couple small comments.

src/Silicon.cpp

Rename pixelInnerBoundsSize variable to pixelBoundsSize

jamesp-epcc added optimization/performance Related to the speed and/or memory consumption of some aspect of the code desc Of possible interest to LSST DESC members looking for a project labels Feb 13, 2023

rmjarvis requested changes Mar 13, 2023

View reviewed changes

rmjarvis added this to the v2.5 milestone Mar 28, 2023

jamesp-epcc added 24 commits March 31, 2023 15:49

Add GPU-accelerated version of accumulate

98e185e

Map data independently from code offload (not working yet)

fe95922

Implement pixel distortion update on GPU

af168f2

Prevent crash in GPU distortion update loop

f014a0c

Fix a bug in GPU pixel bounds computation

ffe777c

Fix null pointer error with certain compilers when some photon data n…

38da021

…ot allocated

Fix various bugs so that output from GPU matches CPU version

b12e813

Add finalizeGPU method to clean everything up and copy back result

6dddbb2

Remove some unused and commented out stuff

94c9318

Tidy up function arguments and fix diffStepRandom size bug

3bd5298

Remove some unneeded GPU pointers. Support single precision images on…

7acb4c4

… GPU

Remove GPU-specific bounds data and use same structures as CPU

c915d0b

Remove separate GPU arrays for boundary points and distortions

5a12cbc

Copy data back from GPU in addDelta and subtractDelta

be0a14b

Release GPU memory after completion

3b237e0

Remove manual memory management from GPU code (still needs tested!)

1d77aa9

Remove custom GPU data structures and use same ones as on CPU

45bb1da

Allow GPU version to build with setup.py

863af63

Add GPU support to setup script

6724bef

Get GPU sensor model to pass tests

6bad511

Make resume feature work with GPU

903e5b9

Tidy up code. Remove redundant non-GPU versions of functions

8fadc6c

Work around incorrect Clang definition of __CUDA_ARCH__

550af7a

Ensure -lgalsim is passed to linker only once and is before -lfftw3

ae8b6ef

jamesp-epcc added 7 commits March 31, 2023 15:49

Fix function ordering, whitespace and comments. Turn off debug mode

0423cf1

Merge random arrays into one to get around Clang runtime argument limit

c31a68a

Add const getter methods to PhotonArray instead of using const_cast

cdaa15b

Move pixel distortion update back out into separate method (still nee…

496530e

…d to merge CPU and GPU versions)

Remove hardcoded abs_length_table and use values passed in from Pytho…

62e11ca

…n layer

Use GPU version of updatePixelDistortions in initialisation

1107b33

Get rid of separate GPU and CPU versions of updatePixelDistortions

cf8ff66

jamesp-epcc force-pushed the omp_target_offload branch from 1201b1c to cf8ff66 Compare March 31, 2023 14:49

rmjarvis requested changes Apr 5, 2023

View reviewed changes

jamesp-epcc added 6 commits April 10, 2023 14:19

Fix various issues with comments

e10f9cd

Use simpler method to calculate size of image data arrays

64dfa4b

Call calculateConversionDepth instead of inlining

49e7dc9

Add comment explaining that GPU requires raw pointers rather than obj…

b7d08de

…ects

Call Bounds::includes instead of inlining

895253c

Use same version of updatePixelBounds on CPU and GPU

f5c7b19

rmjarvis reviewed Apr 17, 2023

View reviewed changes

src/Silicon.cpp Outdated Show resolved Hide resolved

Use unique_ptr<bool[]> to ensure array gets deleted properly

e0882cf

Do not install codecov in CI script

rmjarvis reviewed Apr 21, 2023

View reviewed changes

src/Silicon.cpp Outdated Show resolved Hide resolved

Move updatePixelBounds back to original location and fix formatting

6290a4e

rmjarvis requested changes Apr 27, 2023

View reviewed changes

jamesp-epcc added 3 commits April 28, 2023 10:45

Use += operator in updatePixelBounds to make code clearer

0502f08

Add back old code in insidePixel for documentation purposes and impro…

4454bf0

…ve comments Remove redundant _testpoly vector

Move one-time data initialisation to constructor

cd83a3c

rmjarvis approved these changes Apr 28, 2023

View reviewed changes

jmeyers314 approved these changes May 2, 2023

View reviewed changes

src/Silicon.cpp Outdated Show resolved Hide resolved

src/Silicon.cpp Outdated Show resolved Hide resolved

Define integer min and max functions and use them for clarity

39f941e

Rename pixelInnerBoundsSize variable to pixelBoundsSize

rmjarvis merged commit 6d9d0e9 into GalSim-developers:main May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omp target offload #1212

Omp target offload #1212

jamesp-epcc commented Feb 13, 2023

jamesp-epcc commented Feb 16, 2023

jamesp-epcc commented Feb 20, 2023

rmjarvis left a comment

rmjarvis Mar 13, 2023

rmjarvis Mar 13, 2023

rmjarvis Mar 13, 2023

rmjarvis Mar 13, 2023

rmjarvis Mar 13, 2023

rmjarvis Mar 13, 2023

rmjarvis Mar 13, 2023

rmjarvis Mar 13, 2023

rmjarvis Apr 5, 2023

rmjarvis Mar 13, 2023

jamesp-epcc commented Mar 31, 2023

rmjarvis Apr 5, 2023

jamesp-epcc Apr 17, 2023

rmjarvis Apr 5, 2023

jamesp-epcc Apr 17, 2023

rmjarvis Apr 5, 2023

jamesp-epcc commented Apr 17, 2023

rmjarvis commented Apr 17, 2023 •

edited

Loading

rmjarvis left a comment

rmjarvis Apr 27, 2023

jmeyers314 left a comment

		}

		bool Silicon::insidePixel(int ix, int iy, double x, double y, double zconv,

Omp target offload #1212

Omp target offload #1212

Conversation

jamesp-epcc commented Feb 13, 2023

jamesp-epcc commented Feb 16, 2023

jamesp-epcc commented Feb 20, 2023

rmjarvis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesp-epcc commented Mar 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesp-epcc commented Apr 17, 2023

rmjarvis commented Apr 17, 2023 • edited Loading

rmjarvis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmeyers314 left a comment

Choose a reason for hiding this comment

rmjarvis commented Apr 17, 2023 •

edited

Loading