This repository has been archived by the owner on Mar 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 448
NVBug: 3075796 [PyTorch] temp_storage_bytes overflows in InclusiveScan for size_cub value close to int32 max #221
Labels
Milestone
Comments
Somewhat related to NVIDIA/cccl#744, but this one is weird because it's happening for values < INT_MAX. |
@ptrblck Looks like this is caused by an overflow here: https://github.com/NVIDIA/cub/blob/main/cub/device/dispatch/dispatch_scan.cuh#L299 I'll fix this as part of NVIDIA/thrust#249, since that has some related fixes. @brycelelbach you were right :) |
alliepiper
added a commit
to alliepiper/cub
that referenced
this issue
Feb 10, 2021
Users have been reporting that device algorithms return invalid `temp_storage_bytes` values when `num_items` is close to -- but not over -- INT32_MAX. This is caused by an overflow in the numerator of the pattern `num_tiles = (num_items + items_per_tile - 1) / items_per_tile`. The new function implements the same calculation but protects against overflow. Fixes NVIDIA#221. Bug 3075796
alliepiper
added a commit
to alliepiper/thrust
that referenced
this issue
Feb 10, 2021
The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.
alliepiper
added a commit
to alliepiper/cub
that referenced
this issue
Feb 10, 2021
Users have been reporting that device algorithms return invalid `temp_storage_bytes` values when `num_items` is close to -- but not over -- INT32_MAX. This is caused by an overflow in the numerator of the pattern `num_tiles = (num_items + items_per_tile - 1) / items_per_tile`. The new function implements the same calculation but protects against overflow. Fixes NVIDIA#221. Bug 3075796
alliepiper
added a commit
to alliepiper/thrust
that referenced
this issue
Feb 10, 2021
The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.
alliepiper
added a commit
to alliepiper/cub
that referenced
this issue
Feb 10, 2021
Users have been reporting that device algorithms return invalid `temp_storage_bytes` values when `num_items` is close to -- but not over -- INT32_MAX. This is caused by an overflow in the numerator of the pattern `num_tiles = (num_items + items_per_tile - 1) / items_per_tile`. The new function implements the same calculation but protects against overflow. Fixes NVIDIA#221. Bug 3075796
alliepiper
added a commit
to alliepiper/thrust
that referenced
this issue
Feb 10, 2021
The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.
alliepiper
added a commit
to alliepiper/thrust
that referenced
this issue
Feb 16, 2021
The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
Reported by upstream.
We've created this code snippet to reproduce it in our current container:
Output for different values of
size_cub
:Based on this, it seems
temp_storage_bytes
overflows for asize_cub
value between[2147483647 - 10000, 2147483647 - 1000]
.The text was updated successfully, but these errors were encountered: