[RFE] Support linking multiple Thrust versions: Add hooks that wrap the `thrust::` namespace in a custom namespace #1401

nv-dlasalle · 2021-03-19T00:11:21Z

Problem

Cub allows itself to place into a namespace via CUB_NS_PREFIX and CUB_NS_POSTFIX, such that multiple shared libraries can each utilize their own copy of it (and thus different versions can safely coexist). Static variables used for caching could otherwise cause problems (e.g., https://github.com/NVIDIA/cub/blob/main/cub/util_device.cuh#L212).

Thrust however depends on cub and requires it to not be in another namespace, so users cannot have CUB_NS_PREFIX defined. This means if two libraries use two different versions of thrust (or cub), issues with the caching variables inside of cub can occur.

Possible solutions

A solution would be to add THRUST_NS_PREFIX and THRUST_NS_POSTFIX to allow each library to place the version of thrust it's compiling against within in it's namespace, and either utilize the version of cub in the global namespace, or utilize the version of cub within the same namespace by defining CUB_NS_PREFIX as well.

Another solution, would be to allow users to define something like THRUST_CUB_NS, to tell thrust which namespace to look for cub in:

#define CUB_NS_PREFIX=namespace foobar {
#define CUB_NS_POSTFIX=}
...
#define THRUST_CUB_NS=foobar
#include "thrust/sort.h"

The text was updated successfully, but these errors were encountered:

alliepiper · 2021-03-19T18:28:43Z

Updated the title mention the work that needs to be done.

I think we can nicely solve this with these macro sets:

THRUST_CUB_WRAPPED_NAMESPACE
- Set this to a namespace name that will wrap thrust:: AND cub::.
- Preferred method for most usecases.
THRUST_WRAPPED_NAMESPACE / CUB_WRAPPED_NAMESPACE
- Set these to a namespace name that will wrap thrust:: OR cub::.
- Available for odd usecases.
- Overrides THRUST_CUB_WRAPPED_NAMESPACE.
[THRUST|CUB]_NS_[PREFIX|POSTFIX]
- Implementation details, but may be overridden for backwards compat.
- Overrides any overlapping *_WRAPPED_NAMESPACE definitions.
- Macros containing the actual code that implements the namespace wrappings.

@nv-dlasalle Does this sound reasonable for your needs?

nv-dlasalle · 2021-03-19T23:53:20Z

@allisonvacanti This sounds like it would work perfectly for us. Thanks!

VoVAllen · 2021-03-24T10:42:46Z

PyTorch is experiencing the same issues pytorch/pytorch#54245. Probably CUB should avoid using static variable for cacheing in the template function

alliepiper · 2021-03-24T14:40:46Z

Some context:

Those caches were added a while back to avoid overhead from some expensive CUDA API calls. Users were seeing a significant impact from these calls under certain workloads, and the caches were necessary for good performance in some critical applications.

I agree that using statics in a header is a fragile solution and, well, generally not a good idea. But we don't really have a lot of other options -- Thrust/CUB are header-only, so we can't place the cache in a library component. C++17 inline variables may provide a nicer workaround eventually, but we can't rely on them yet.

For now, the namespace workaround will be the preferred solution, but I just wanted to share that we're aware of the issue and want to move to a more robust solution when one becomes available.

VoVAllen · 2021-03-24T16:04:59Z

Thanks! I found it should be fine using static variable inside a non-template non-inline function. In those cases, gcc won't compile these symbols as UNIQUE. And if every library uses RTLD_LOCAL without UNIQUE symbols, they can only see their own static variables, which avoids the conflict problems.

However, the UNIQUE symbol breaks the RTLD_LOCAL setting, that later library loaded won't instantiate its own static variable. This causes the conflict

ngimel · 2021-03-24T16:40:31Z

If each library indeed saw the same static variable, that would be fine - the values that are cached are supposed to be the same for all libraries. But in case of pytorch/pytorch#52663 and pytorch/pytorch#54245 a new static is allocated, but its constructor is not called, so there are 0 devices instead of correctly cached number of devices (same would happen for other cached attributes).
That said, separate namespaces for multiple thrust/cub versions sound good.

VoVAllen · 2021-03-25T06:40:40Z

Separate namespace could solve this issue, which is also the solution DGL team've adapted dmlc/dgl#2758. However, I still prefer to share my investigation here for people to understand the root cause and avoid similar issue in the future. It took us about one week to figure out the root cause.

I believe the cause is the UNIQUE symbol, however I found there's limited resources explaining this. I'm not an expertise in C++ so my statement here could be wrong.

Some solutions I found

passing -Xcompiler=-fno-gnu-unique to the nvcc compiler, which force compiler not to create any unique symbol.
Second possible solution I found is to make the symbol and the function as hidden when exporting the symbol, this will also change the symbol from UNIQUE to LOCL (but I haven't tried this yet)
Third solution is to set the function as static, which has similar effect as the second solution

Using template function with static variable will result in some unusual behavior. I hope my investigation here can help people find a better solution.

Reference:

A simple gist for the UNIQUE symbol: dmlc/dgl#2758

alliepiper · 2021-06-21T16:51:02Z

#1464 and NVIDIA/cub#326 provide the new namespace customization hooks. With those applied, defining THRUST_CUB_WRAPPED_NAMESPACE="foo" before including any Thrust/CUB headers will move the thrust:: and cub:: namespaces to foo::thrust:: and foo::cub::. By specifying different namespaces for different dynamic libraries, collisions and ambiguity can be avoided.

Alternatively, if THRUST_CUB_USE_ANON_NAMESPACE defined, all of thrust:: and cub:: will be placed in anonymous namespaces. Similar to the suggestion of using static functions, this should also address the issue, though it will likely bloat binary size as symbols can no longer be reused by different TUs in each library.

I'm wrapping up testing and reviews, but these are passing initial tests. If anyone gets a chance to try these out and see if they fix their dynamic linking problems, please let me know.

alliepiper · 2021-07-19T19:35:03Z

The fix for this has landed. Define THRUST_CUB_WRAPPED_NAMESPACE to a unique name for each library linked together, and all of thrust:: and cub:: will be placed in the requested outer namespace. This will avoid symbol collisions between libraries.

For more info:

Thrust implementation: https://github.com/NVIDIA/thrust/blob/main/thrust/detail/config/namespace.h
CUB implementation: https://github.com/NVIDIA/cub/blob/main/cub/util_namespace.cuh

alliepiper · 2021-07-19T19:35:49Z

(Note that the anonymous namespace macros have been removed. These were interacting badly with nvcc, and will not be implemented in the forseeable future.)

nv-dlasalle mentioned this issue Mar 19, 2021

[Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 dmlc/dgl#2758

Merged

5 tasks

alliepiper changed the title ~~Having multiple shared libraries that are compiled with different version of thrust (and cub) causes initialization issues~~ [RFE] Support linking multiple Thrust versions: Add hooks that wrap the thrust:: namespace in a custom namespace Mar 19, 2021

alliepiper added type: enhancement New feature or request. good first issue Good for newcomers. labels Mar 22, 2021

alliepiper added this to the 1.13.0 milestone Mar 22, 2021

VoVAllen mentioned this issue Mar 24, 2021

invalid device ordinal in pytorch1.8+cuda11.1 pytorch/pytorch#54245

Closed

danpovey mentioned this issue Mar 26, 2021

Crash in CUDA 11 k2-fsa/k2#698

Closed

ngimel mentioned this issue Mar 29, 2021

code linking to libtorch cannot use thrust/cub functions pytorch/pytorch#52663

Open

brycelelbach added the P2: nice to have Desired, but not necessary. label Mar 29, 2021

brycelelbach modified the milestones: 1.13.0, 1.14.0 Mar 29, 2021

BarclayII mentioned this issue Apr 1, 2021

When will cuda 11.2 be supported? dmlc/dgl#2790

Closed

ppwwyyxx mentioned this issue Apr 3, 2021

CUDA error: device-side assert triggered(torch1.8.1+cuda11.1) pytorch/pytorch#55027

Closed

evelkey mentioned this issue Jun 1, 2021

parallel_for's throw_on_error results in terminate #1448

Closed

alliepiper self-assigned this Jun 4, 2021

alliepiper added P0: must have Absolutely necessary. Critical issue, major blocker, etc. and removed P2: nice to have Desired, but not necessary. labels Jun 4, 2021

alliepiper mentioned this issue Jun 17, 2021

Add ability to place Thrust in a custom namespace. #1464

Merged

alliepiper removed the good first issue Good for newcomers. label Jun 19, 2021

alliepiper closed this as completed in #1464 Jul 19, 2021

grimoire mentioned this issue Nov 4, 2021

tensorrt test failed open-mmlab/mmcv#1454

Closed

twmht mentioned this issue Nov 17, 2021

Compile cub/thrust with no unique symbol cupy/cupy#6106

Merged

johnstairs mentioned this issue Dec 21, 2021

Compile all GPU code in toolboxes into a single library hansenms/gadgetron#8

Merged

hansenms mentioned this issue Dec 23, 2021

Reorganization of Docker image building gadgetron/gadgetron#1036

Merged

This was referenced Feb 9, 2022

Avoid using thrust:: directly, use THRUST_NS_QUALIFIER:: instead pytorch/pytorch#72582

Open

Fix building for CUDA 11.6 k2-fsa/k2#917

Merged

sleeepyjack mentioned this issue Jul 19, 2022

[FEA] Support linking multiple versions of cuco in the same project NVIDIA/cuCollections#189

Open

alliepiper mentioned this issue Jul 25, 2022

[BUG] cudaErrorInvalidDeviceFunction: invalid device function #1737

Closed

jrhemstad mentioned this issue Jul 27, 2022

[BUG] libcudf must customize the Thrust/CUB namespace rapidsai/cudf#11368

Closed

elstehle mentioned this issue Aug 9, 2022

Dispatch mechanism may break when any two libraries that use CUB and/thrust have been compiled for different set of GPU architectures NVIDIA/cub#545

Closed

lilohuang mentioned this issue Oct 31, 2022

[BUG] Need to wrap cub:: and thrust:: namespace in nvcomp namespace NVIDIA/nvcomp#72

Closed

daniellengyel mentioned this issue Aug 9, 2023

[QST] Thrust::system:system_error rapidsai/raft#1721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] Support linking multiple Thrust versions: Add hooks that wrap the `thrust::` namespace in a custom namespace #1401

[RFE] Support linking multiple Thrust versions: Add hooks that wrap the `thrust::` namespace in a custom namespace #1401

nv-dlasalle commented Mar 19, 2021

alliepiper commented Mar 19, 2021

nv-dlasalle commented Mar 19, 2021

VoVAllen commented Mar 24, 2021

alliepiper commented Mar 24, 2021

VoVAllen commented Mar 24, 2021

ngimel commented Mar 24, 2021

VoVAllen commented Mar 25, 2021

alliepiper commented Jun 21, 2021

alliepiper commented Jul 19, 2021

alliepiper commented Jul 19, 2021

[RFE] Support linking multiple Thrust versions: Add hooks that wrap the thrust:: namespace in a custom namespace #1401

[RFE] Support linking multiple Thrust versions: Add hooks that wrap the thrust:: namespace in a custom namespace #1401

Comments

nv-dlasalle commented Mar 19, 2021

Problem

Possible solutions

alliepiper commented Mar 19, 2021

nv-dlasalle commented Mar 19, 2021

VoVAllen commented Mar 24, 2021

alliepiper commented Mar 24, 2021

VoVAllen commented Mar 24, 2021

ngimel commented Mar 24, 2021

VoVAllen commented Mar 25, 2021

alliepiper commented Jun 21, 2021

alliepiper commented Jul 19, 2021

alliepiper commented Jul 19, 2021

[RFE] Support linking multiple Thrust versions: Add hooks that wrap the `thrust::` namespace in a custom namespace #1401

[RFE] Support linking multiple Thrust versions: Add hooks that wrap the `thrust::` namespace in a custom namespace #1401