-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda 9.0 "error: more than one operator "==" matches these operands" #797
Comments
Before you build try |
Hey @csarofeen . That did the trick! On a side note. This has the appearance of disabling the half operators in the cuda code. Will this impact the half variables performance when run on the device? |
It will, for the better. |
@csarofeen this did it for me as well, thank you! |
I had the same issue:
Running I was using Ubuntu 16.04. |
@csarofeen If it's better to disable the half operators, then what are they used for? Why are they included in the cuda code? And what kind of performance boost are we talking about here? |
Cuda 9 added half operators in the cuda half header. Half operations in torch predate that so they already existed in torch. This keeps the half definition from the cuda header, while not compiling the operators. |
@csarofeen Do you have any other performance tips for Cuda and/or cuDNN with Torch7? Because I've noticed that Cuda 9.0 and cuDNN v7 have even worse performance than Cuda 8.0 and cuDNN v5: jcjohnson/neural-style#429 |
same issue. but export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" didn't work for me?? how to solve that? |
@sfzyk Could you please explain all steps you took to install CUDA, NCCL, cuDNN, and pytorch and paste here some of the output from the error? It is very hard to assist the only information provided is "didn't work". |
Install Torch 7 in Ubuntu 16.04 cause error: (1)To uninstall the CUDA Toolkit, run the uninstallation script provided in the bin directory of the toolkit. By default, it is located in /usr/local/cuda-9.1/bin: 2、 install cuda 8.0 - download address:https://developer.nvidia.com/cuda-80-ga2-download-archive (1)install 8.0 deb (2)install patch2 3、install Torch if earlier error caused, use:sudo ./clean.sh source ~/.bashrc |
@csarofeen I tried Now, I'm getting warnings like this:
|
I had the same problem and it was driving me nuts. This did not work:
This did work:
Hope that helps. |
@thompa2 |
With 9.2 you need. export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF2_OPERATORS__" |
@ricpruss I did |
You still getting the errors on operator overload? |
Same here with cuda 9.0 and cudnn 7.
|
@ricpruss I get these errors:
|
@thompa2 AMAZING! Thank you! After hours of searching for a solution this worked. (well I'm at 20% now which is more than I've been able to get to until now) What a pain! |
I lie, failed again........ but this time at 20% . That's progress isn't it? I'm not sure, I'm thinking for giving up. I've got MacBook Pro (13-inch, 2017) This is the error message at 20%:
9 warnings generated. /Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): error: specified alignment (4) is different from alignment (2) specified on a previous declaration /Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): error: specified alignment (4) is different from alignment (2) specified on a previous declaration /Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(95): error: specified alignment (8) is different from alignment (2) specified on a previous declaration /Users/fredlemieux/torch/extra/cutorch/lib/THC/THCTensorRandom.cuh(156): error: specified alignment (8) is different from alignment (2) specified on a previous declaration 5 errors detected in the compilation of "/tmp/tmpxft_00011634_00000000-11_THCTensorRandom.compute_61.cpp1.ii". make[2]: *** [lib/THC/CMakeFiles/THC.dir/THC_generated_THCTensorRandom.cu.o] Error 1 Error: Build error: Failed building. |
Same error here |
It works on Ubuntu 18.04 too. |
@ricpruss I know this has been a while but for some reason I need to compile Torch with CUDA 9.2. I remember I tried
How did you resolve this? |
I get the same issue on Windows10, pytorch1.1.0, vs 2017 with version 15.4 toolset. Anyone have the good method? E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(190): error: more than one operator "<" matches these operands: E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(191): error: more than one operator "<=" matches these operands: E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(192): error: more than one operator ">" matches these operands: E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(193): error: more than one operator ">=" matches these operands: E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(194): error: more than one operator "==" matches these operands: E:/Program Files/Python35/lib/site-packages/torch/include\THC/THCNumerics.cuh(196): error: more than one operator "!=" matches these operands: |
l got same error on ubuntu 18.04 cudnn 10.1 cuda 7.5 |
@tjusxh Me too. Have you solved it? |
@csarofeen you earlier said that the performance will improve for the better by disabling the CUDA builtin operators. Can you explain how?
|
This didn't work for me try this |
Windows user have to use: |
This comes from PyTorch CMake files:
That is why normal pytorch build won't get this error. |
On Nvidia NGC Docker, targets several GPUs. For deepspeed, we have to setup arch_list up to Volta architecture. TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0" DS_BUILD_OPS=1 DS_BUILD_FUSED_LAMB=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_SPARSE_ATTN=1 DS_BUILD_TRANSFORMER=1 DS_BUILD_UTILS=1 python3 setup.py install |
not working due to an error compiling half operators in CUDA programs on github actions torch/cutorch#797 https://github.com/tritonuas/obcpp/actions/runs/8637979133/job/23681389912#step:8:6390
* nvidia docker image builds also moved x86 dockerfile to docker folder. also also obcpp segfaults when run in the container. will look into soon * cuda_check integration test * docker dir makefile * build all binaries also add a comment for where base image came from * fix build_jetson_image paths * cuda_check builds and works * nvidia docker image builds also moved x86 dockerfile to docker folder. also also obcpp segfaults when run in the container. will look into soon * cuda_check integration test * docker dir makefile * build all binaries also add a comment for where base image came from * fix build_jetson_image paths * cuda_check builds and works * github action to build jetson dockerfile * pull_request * github actions might work? * jetson Docker: compile obcpp and cuda_check * bespoke github env * trying to free github runner space * disable half operators? * trying dusty-nv's torchvision install * moved back to run make for torchvision * jetson docker composes * disable jetson docker build not working due to an error compiling half operators in CUDA programs on github actions torch/cutorch#797 https://github.com/tritonuas/obcpp/actions/runs/8637979133/job/23681389912#step:8:6390 * devcontianer use new x86 tag * remove Dockerfile.nvidia in favor of Dockerfile.jetson * instructions for building jetson image --------- Co-authored-by: tuas-travis-ci <ucsdtuas@gmail.com>
While attempting to build torch from master with cutorch with cuda 9.0.103-1 on Ubuntu 16.04 I hit an error with multiple attempts to overload the "==" and "!=" operators.
Below is an example of the error I receive.
I was able to track down the two operator overloads.
One is in
https://github.com/torch/cutorch/blob/master/lib/THC/THCTensorTypeUtils.cuh#L176
And the other is in
/usr/local/cuda-9.0/targets/ppc64le-linux/include/cuda_fp16.hpp
The operator in
cuda_fp16.hpp
was provided by the cuda package, but only covers the__device__
and not the__host__
. So we still need to overload the "==" for halfs in the__host__
, however, the code currently in cutorch fails on compile time.It looks like @csarofeen worked on the initial port to cuda9.0 for cutorch. I'm not sure if he can provide some help on what's going on here?
Is there any additional information you need from me? Thanks in advance!!
The text was updated successfully, but these errors were encountered: