NVBug: 3075796 [PyTorch] temp_storage_bytes overflows in InclusiveScan for size_cub value close to int32 max #221

alliepiper · 2020-10-19T15:44:55Z

Reported by upstream.

We've created this code snippet to reproduce it in our current container:

import torch
import torch.nn as nn
from torch.utils import cpp_extension

cuda_source = """
#include <cub/device/device_scan.cuh>

void my_fun(void)
{
    int size_cub = 2147483647-100000;
    auto self = torch::ones({size_cub});
    auto result = self.clone();
    size_t temp_storage_bytes = 0;
    cub::DeviceScan::InclusiveScan(nullptr, temp_storage_bytes, self.data_ptr<float>(), result.data_ptr<float>(), [] __host__ __device__ (const float& a, const float& b){return (b < a) ? b: a;}, size_cub);
    std::cout << "temp_storage_bytes " << temp_storage_bytes << std::endl;
    return;
}
"""

cpp_source = """
    void my_fun(void);
"""

module = torch.utils.cpp_extension.load_inline(
    name="cuda_test_extension",
    cpp_sources=cpp_source,
    cuda_sources=cuda_source,
    functions="my_fun",
    extra_cuda_cflags=["--extended-lambda"],
    verbose=True,
)

module.my_fun()

print('done')

Output for different values of size_cub:

2147483647
temp_storage_bytes 18446744073700604415

2147483647 - 10
temp_storage_bytes 18446744073700604415

2147483647 - 100
temp_storage_bytes 18446744073700604415

2147483647 - 1000
temp_storage_bytes 18446744073700604415

2147483647 - 10000
temp_storage_bytes 8948479

2147483647 - 100000
temp_storage_bytes 8947967

Based on this, it seems temp_storage_bytes overflows for a size_cub value between [2147483647 - 10000, 2147483647 - 1000].

The text was updated successfully, but these errors were encountered:

alliepiper · 2020-10-19T16:00:31Z

Somewhat related to NVIDIA/cccl#744, but this one is weird because it's happening for values < INT_MAX.

alliepiper · 2021-02-09T22:02:55Z

@ptrblck Looks like this is caused by an overflow here: https://github.com/NVIDIA/cub/blob/main/cub/device/dispatch/dispatch_scan.cuh#L299

I'll fix this as part of NVIDIA/thrust#249, since that has some related fixes.

@brycelelbach you were right :)
(different location but same bug)

Users have been reporting that device algorithms return invalid `temp_storage_bytes` values when `num_items` is close to -- but not over -- INT32_MAX. This is caused by an overflow in the numerator of the pattern `num_tiles = (num_items + items_per_tile - 1) / items_per_tile`. The new function implements the same calculation but protects against overflow. Fixes NVIDIA#221. Bug 3075796

The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.

Users have been reporting that device algorithms return invalid `temp_storage_bytes` values when `num_items` is close to -- but not over -- INT32_MAX. This is caused by an overflow in the numerator of the pattern `num_tiles = (num_items + items_per_tile - 1) / items_per_tile`. The new function implements the same calculation but protects against overflow. Fixes NVIDIA#221. Bug 3075796

The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.

Users have been reporting that device algorithms return invalid `temp_storage_bytes` values when `num_items` is close to -- but not over -- INT32_MAX. This is caused by an overflow in the numerator of the pattern `num_tiles = (num_items + items_per_tile - 1) / items_per_tile`. The new function implements the same calculation but protects against overflow. Fixes NVIDIA#221. Bug 3075796

The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.

alliepiper added the nvbug Has an associated internal NVIDIA NVBug. label Oct 19, 2020

alliepiper added this to the 1.11.1 milestone Oct 19, 2020

alliepiper added the type: bug: functional Does not work as intended. label Oct 21, 2020

ptrblck mentioned this issue Feb 8, 2021

Tensor.nonzero tries to allocate huge amount of memory for tensors on GPU with num_elements close to INT_MAX pytorch/pytorch#51872

Open

brycelelbach mentioned this issue Feb 8, 2021

Enable more warning flags. #249

Merged

alliepiper self-assigned this Feb 9, 2021

alliepiper added a commit to alliepiper/thrust that referenced this issue Feb 10, 2021

Use new cub::DivideAndRoundUp util to avoid overflow errors.

53b6dcb

The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.

alliepiper linked a pull request Feb 10, 2021 that will close this issue

Enable more warning flags. #249

Merged

alliepiper added a commit to alliepiper/thrust that referenced this issue Feb 10, 2021

Use new cub::DivideAndRoundUp util to avoid overflow errors.

a802fbc

The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.

alliepiper added a commit to alliepiper/thrust that referenced this issue Feb 10, 2021

Use new cub::DivideAndRoundUp util to avoid overflow errors.

000fa48

The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.

alliepiper closed this as completed in #249 Feb 16, 2021

alliepiper added a commit to alliepiper/thrust that referenced this issue Feb 16, 2021

Use new cub::DivideAndRoundUp util to avoid overflow errors.

8f876ba

The expression `(n + d - 1) / d` can overflow the numerator. The new method avoids that. See NVIDIA/cub#221 for reference.

trxcllnt mentioned this issue Nov 8, 2023

[BUG] inclusive_scan_by_key OOM on >= INT_MAX elements NVIDIA/cccl#766

Open

alliepiper mentioned this issue Jun 18, 2021

RuntimeError: nonzero is not supported for tensors with more than INT_MAX elements for torch.masked_select pytorch/pytorch#60267

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVBug: 3075796 [PyTorch] temp_storage_bytes overflows in InclusiveScan for size_cub value close to int32 max #221

NVBug: 3075796 [PyTorch] temp_storage_bytes overflows in InclusiveScan for size_cub value close to int32 max #221

alliepiper commented Oct 19, 2020

alliepiper commented Oct 19, 2020

alliepiper commented Feb 9, 2021

NVBug: 3075796 [PyTorch] temp_storage_bytes overflows in InclusiveScan for size_cub value close to int32 max #221

NVBug: 3075796 [PyTorch] temp_storage_bytes overflows in InclusiveScan for size_cub value close to int32 max #221

Comments

alliepiper commented Oct 19, 2020

alliepiper commented Oct 19, 2020

alliepiper commented Feb 9, 2021