Warn the user when choice of grid size can cause a CUDA error #340

JonathanMaes · 2024-10-17T12:20:02Z

This PR adds warnings for two situations where the choice of grid size could result in a CUDA error.

Simulations failing to run: panic: CUDA_ERROR_INVALID_VALUE #284
When the number of cells along an axis has a prime factor >127, the CUDA_ERROR_INVALID_VALUE error occurs because of the inner workings of the cuFFT algorithm (see @jplauzie's reply).

The new warning (example below) is already raised when the grid is not 7-smooth, i.e. when there is a prime factor greater than 7. This includes the >127 case, while also raising awareness about the recommendation to use a 7-smooth grid.
```
// WARNING: y-axis is not 7-smooth. It has 501 cells, with prime
//          factors [3 167], at least one of which is greater than 7.
//          This may reduce performance or cause a CUDA_ERROR_INVALID_VALUE error.
```
panic: CURAND_STATUS_LENGTH_NOT_MULTIPLE issue when grid size odd and temperature finite #314
When temperature is nonzero, and the grid contains an odd number of cells, the CURAND_STATUS_LENGTH_NOT_MULTIPLE error occurs. This is explained in the curandGenerateNormal documentation:

Normally distributed results are generated from pseudorandom generators with a Box-Muller transform, and so require n to be even.

The new warning (example below) is raised if the grid is odd, when the random thermal field is updated for the first time.
```
// WARNING: nonzero temperature requires an even amount of grid cells,
//          but all axes have an odd number of cells: [625 625 1].
//          This may cause a CURAND_STATUS_LENGTH_NOT_MULTIPLE error.
```

These warnings are printed during program execution, so may be buried within the output. Alternatively, an error could be raised, but that seems premature if the CUDA error has not yet occurred. Alternatively, the warning could be printed at the very end of the output, but that seems hard to implement.

…#340

JonathanMaes added 2 commits October 17, 2024 11:50

Add warning when grid is not 7-smooth

0664f46

Add warning when grid is odd and temperature is nonzero

f903b68

JonathanMaes requested a review from JLeliaert October 17, 2024 12:20

MathieuMoalic added a commit to MathieuMoalic/amumax that referenced this pull request Oct 18, 2024

Warn the user when choice of grid size can cause a CUDA error mumax/3…

5a04a0e

…#340

JonathanMaes mentioned this pull request Nov 4, 2024

mumax3.11 #338

Open

2 tasks

JonathanMaes merged commit 01bda93 into 3.11 Nov 4, 2024

JonathanMaes deleted the feature/warnPrimeFactors branch November 4, 2024 10:01

MathieuMoalic added a commit to MathieuMoalic/amumax that referenced this pull request Nov 20, 2024

Warn the user when choice of grid size can cause a CUDA error mumax/3…

8d26c07

…#340

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn the user when choice of grid size can cause a CUDA error #340

Warn the user when choice of grid size can cause a CUDA error #340

JonathanMaes commented Oct 17, 2024

Warn the user when choice of grid size can cause a CUDA error #340

Warn the user when choice of grid size can cause a CUDA error #340

Conversation

JonathanMaes commented Oct 17, 2024