-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Omp target offload #1212
Omp target offload #1212
Conversation
I see the CI is failing. Apparently gcc doesn't like me mapping |
I've wrapped the GPU directives in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is pretty hard to review right now. The src/Silicon.cpp file in particular.
You moved some of the functions around, which means they don't show up as a useful diff currently. There is a full function removed, and then a very similar function added in a different location. You also removed some inline comments, which makes git think blocks of code that are otherwise identical are actually different. There are also some new functions, which are just old functions with some slight changes to data structures, not actually (I think!) any new code.
Please reorganize this so that the Silicon.cpp file show on github as a reasonable diff, so I can review what the real changes are. Thanks!
setup.py
Outdated
@@ -47,7 +47,7 @@ | |||
raise | |||
|
|||
# Turn this on for more verbose debugging output about compile attempts. | |||
debug = False | |||
debug = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be turned back off.
src/Silicon.cpp
Outdated
} | ||
|
||
bool Silicon::insidePixel(int ix, int iy, double x, double y, double zconv, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move this and searchNeighbors back up to where they used to be, so git will show me what the differences actually are? Putting them down here means I have to manually compare the two versions to check that nothing substantive changed. Likewise make sure addDelta and subtractDelta show up here as regular diffs, since they're also in different places now.
src/Silicon.cpp
Outdated
} | ||
|
||
template <typename T> | ||
double Silicon::accumulate(const PhotonArray& photons, int i1, int i2, | ||
BaseDeviate rng, ImageView<T> target) | ||
BaseDeviate rng, ImageView<T> target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like you maybe have tabs in the new version? There shouldn't be any whitespace change here.
src/Silicon.cpp
Outdated
185528756957.328400, 182815356489.945160, 263157894736.842072, | ||
398406374501.992065, 558659217877.094971, 469483568075.117371, | ||
833333333333.333374, 917431192660.550415, 1058201058201.058228 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is right. We pass the abs_length table from python. This shouldn't be hard-coded into the C++ layer.
src/Silicon.cpp
Outdated
if (tableIdx < 0) tableIdx = 0; | ||
int tableIdx1 = tableIdx + 1; | ||
if (tableIdx > 239) tableIdx = 239; | ||
if (tableIdx1 > 239) tableIdx1 = 239; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many special numbers here. 255.0, 5.0, 239. These all need to be computed from the input abs_length_table you get from python. Don't assume these are necessarily the values forever.
src/Silicon.cpp
Outdated
double* photonsDXDZ = photonsMutable.getDXDZArray(); | ||
double* photonsDYDZ = photonsMutable.getDYDZArray(); | ||
double* photonsFlux = photonsMutable.getFluxArray(); | ||
double* photonsWavelength = photonsMutable.getWavelengthArray(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need pointers to mutable values? If not, you should just add const versions of these in the PhotonArray class that returns const double*
pointers rather than using const_cast.
src/Silicon.cpp
Outdated
y0 += dydz * dz_pixel; // dy in pixels | ||
} | ||
xdbg<<" => "<<x0<<','<<y0; | ||
// This is the reverse of depth. zconv is how far above the substrate the e- converts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got rid of most of these inline comments in your refactoring. Please add them back.
include/galsim/Silicon.h
Outdated
Bounds<double>* pixelInnerBoundsData, | ||
Bounds<double>* pixelOuterBoundsData, | ||
Position<float>* horizontalBoundaryPointsData, | ||
Position<float>* verticalBoundaryPointsData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not have two ways to do this please. Rewrite the existing updatePixelBounds to work with the GPU. Don't repeat everything in a second function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not fixed yet. You still have both of these functions.
src/Silicon.cpp
Outdated
for (int i = 0; i < imageDataSize; i++) { | ||
targetData[i] += deltaData[i]; | ||
deltaData[i] = 0.0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this should be a call to addDelta
, right? Looks like an oversight in the previous code, but it's more important now that addDelta isn't just a one-liner.
…d to merge CPU and GPU versions)
1201b1c
to
cf8ff66
Compare
I have addressed Mike's feedback and rebased against the current main branch.
|
include/galsim/Silicon.h
Outdated
@@ -265,6 +278,19 @@ namespace galsim | |||
Table _abs_length_table; | |||
bool _transpose; | |||
ImageAlloc<double> _delta; | |||
std::shared_ptr<bool> _changed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is wrong. This used to be a local vector. And _changed.get() is still being used as a bare C array. This might explain the crashes you were seeing. So after fixing this, probably could try to put back the min/max usage, which IMO is more readable than the two step you switched to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved _changed to be a member variable rather than local to avoid the overhead of allocating it on the GPU every time update is called. The reason it's now a shared_ptr to a bare array instead of a vector is that bool vectors have an optimised implementation that packs multiple bools into each byte, so you can't get a pointer to the raw data, which the GPU requires.
I agree that std::min and std::max would be more readable, but surprisingly the GPU doesn't support them (I was using them initially but they caused weird bugs that took a long time to track down). We could implement custom, GPU-friendly min and max functions rather than writing out the algorithm though.
if (y < ny) changed[(x * ny) + y] = true; // pixel above | ||
if (y > 0) changed[(x * ny) + (y - 1)] = true; // pixel below | ||
if (y < ny) changedData[(x * ny) + y] = true; // pixel above | ||
if (y > 0) changedData[(x * ny) + (y - 1)] = true; // pixel below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where you use changeData, but this is a pointer that you got from _changed.get(), and _changed is just a single shared_ptr, not a vector. So this looks like undefined behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_changed.get() will return the raw bool pointer from inside the shared pointer, so this works as intended. See above for explanation of why this is no longer a vector.
include/galsim/Silicon.h
Outdated
Bounds<double>* pixelInnerBoundsData, | ||
Bounds<double>* pixelOuterBoundsData, | ||
Position<float>* horizontalBoundaryPointsData, | ||
Position<float>* verticalBoundaryPointsData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not fixed yet. You still have both of these functions.
The CI tests are failing as they are unable to install |
Just remove codecov from the ci script. We actually haven't been using it for a while, since they now recommend a bash uploader. And they recently removed codecov from pypi, so pip can't find it anymore. Probably to help discourage people from using the obsolete uploader. |
Do not install codecov in CI script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I'm mostly happy with this now. Just a couple minor comments.
But I do still want Josh to take a look at some of the OpenMP GPU directives to see if he has any comments there.
_emptypolyGPU[i].x = _emptypoly[i].x; | ||
_emptypolyGPU[i].y = _emptypoly[i].y; | ||
} | ||
Position<double>* emptypolyData = _emptypolyGPU.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my above comment still applies. I also suspect most of this (after setting targetData and deltaData) belongs in the constructor. Not in initialize
, since I think they are not going to change for each target image.
…ve comments Remove redundant _testpoly vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I left a couple small comments.
Rename pixelInnerBoundsSize variable to pixelBoundsSize
This is my GPU acceleration work on the sensor model, rebased against the current main branch. Obviously this is a very extensive change, so I'm open to discussing how it's implemented and making changes to better fit the existing code. There is a single unified code base for both GPU and CPU implementations; when offloading is disabled (which can be done via an environment variable or by building with a non-GPU-aware compiler), the offloaded regions act like OpenMP parallel regions, so the loops still take advantage of multiple CPU cores as before. A version of Clang with offloading support is required to build this for GPU.