Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

NVBug: 3075796 [PyTorch] temp_storage_bytes overflows in InclusiveScan for size_cub value close to int32 max #221

Closed
alliepiper opened this issue Oct 19, 2020 · 2 comments · Fixed by #249
Assignees
Labels
nvbug Has an associated internal NVIDIA NVBug. type: bug: functional Does not work as intended.
Milestone

Comments

@alliepiper
Copy link
Collaborator

Reported by upstream.

We've created this code snippet to reproduce it in our current container:

import torch
import torch.nn as nn
from torch.utils import cpp_extension

cuda_source = """
#include <cub/device/device_scan.cuh>

void my_fun(void)
{
    int size_cub = 2147483647-100000;
    auto self = torch::ones({size_cub});
    auto result = self.clone();
    size_t temp_storage_bytes = 0;
    cub::DeviceScan::InclusiveScan(nullptr, temp_storage_bytes, self.data_ptr<float>(), result.data_ptr<float>(), [] __host__ __device__ (const float& a, const float& b){return (b < a) ? b: a;}, size_cub);
    std::cout << "temp_storage_bytes " << temp_storage_bytes << std::endl;
    return;
}
"""

cpp_source = """
    void my_fun(void);
"""

module = torch.utils.cpp_extension.load_inline(
    name="cuda_test_extension",
    cpp_sources=cpp_source,
    cuda_sources=cuda_source,
    functions="my_fun",
    extra_cuda_cflags=["--extended-lambda"],
    verbose=True,
)

module.my_fun()

print('done')

Output for different values of size_cub:

2147483647
temp_storage_bytes 18446744073700604415

2147483647 - 10
temp_storage_bytes 18446744073700604415

2147483647 - 100
temp_storage_bytes 18446744073700604415

2147483647 - 1000
temp_storage_bytes 18446744073700604415

2147483647 - 10000
temp_storage_bytes 8948479

2147483647 - 100000
temp_storage_bytes 8947967

Based on this, it seems temp_storage_bytes overflows for a size_cub value between [2147483647 - 10000, 2147483647 - 1000].

@alliepiper alliepiper added the nvbug Has an associated internal NVIDIA NVBug. label Oct 19, 2020
@alliepiper alliepiper added this to the 1.11.1 milestone Oct 19, 2020
@alliepiper
Copy link
Collaborator Author

Somewhat related to NVIDIA/cccl#744, but this one is weird because it's happening for values < INT_MAX.

@alliepiper
Copy link
Collaborator Author

@ptrblck Looks like this is caused by an overflow here: https://github.com/NVIDIA/cub/blob/main/cub/device/dispatch/dispatch_scan.cuh#L299

I'll fix this as part of NVIDIA/thrust#249, since that has some related fixes.

@brycelelbach you were right :)
(different location but same bug)

alliepiper added a commit to alliepiper/cub that referenced this issue Feb 10, 2021
Users have been reporting that device algorithms return invalid
`temp_storage_bytes` values when `num_items` is close to -- but
not over -- INT32_MAX.

This is caused by an overflow in the numerator of the pattern
`num_tiles = (num_items + items_per_tile - 1) / items_per_tile`.

The new function implements the same calculation but protects against
overflow.

Fixes NVIDIA#221.
Bug 3075796
alliepiper added a commit to alliepiper/thrust that referenced this issue Feb 10, 2021
The expression `(n + d - 1) / d` can overflow the numerator. The
new method avoids that.

See NVIDIA/cub#221 for reference.
@alliepiper alliepiper linked a pull request Feb 10, 2021 that will close this issue
alliepiper added a commit to alliepiper/cub that referenced this issue Feb 10, 2021
Users have been reporting that device algorithms return invalid
`temp_storage_bytes` values when `num_items` is close to -- but
not over -- INT32_MAX.

This is caused by an overflow in the numerator of the pattern
`num_tiles = (num_items + items_per_tile - 1) / items_per_tile`.

The new function implements the same calculation but protects against
overflow.

Fixes NVIDIA#221.
Bug 3075796
alliepiper added a commit to alliepiper/thrust that referenced this issue Feb 10, 2021
The expression `(n + d - 1) / d` can overflow the numerator. The
new method avoids that.

See NVIDIA/cub#221 for reference.
alliepiper added a commit to alliepiper/cub that referenced this issue Feb 10, 2021
Users have been reporting that device algorithms return invalid
`temp_storage_bytes` values when `num_items` is close to -- but
not over -- INT32_MAX.

This is caused by an overflow in the numerator of the pattern
`num_tiles = (num_items + items_per_tile - 1) / items_per_tile`.

The new function implements the same calculation but protects against
overflow.

Fixes NVIDIA#221.
Bug 3075796
alliepiper added a commit to alliepiper/thrust that referenced this issue Feb 10, 2021
The expression `(n + d - 1) / d` can overflow the numerator. The
new method avoids that.

See NVIDIA/cub#221 for reference.
alliepiper added a commit to alliepiper/thrust that referenced this issue Feb 16, 2021
The expression `(n + d - 1) / d` can overflow the numerator. The
new method avoids that.

See NVIDIA/cub#221 for reference.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
nvbug Has an associated internal NVIDIA NVBug. type: bug: functional Does not work as intended.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant