Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detectron2 build on Windows 10, CUDA 10.1 works #2

Open
apiszcz opened this issue Nov 27, 2019 · 38 comments
Open

detectron2 build on Windows 10, CUDA 10.1 works #2

apiszcz opened this issue Nov 27, 2019 · 38 comments

Comments

@apiszcz
Copy link

apiszcz commented Nov 27, 2019

Thank you for this work and research, let's hope FaceBook adopts the changes for the build.
The build and initial tests are working on Windows 10.

@MichaelBarz
Copy link

MichaelBarz commented Apr 24, 2020

Hi. Thanks a lot for your effort. I just tested the compilation using CUDA 10.0 instead of 10.1. This works as well, just use Visual Studio 2017 instead of 2019 (Community version works as well).

@apiszcz
Copy link
Author

apiszcz commented Apr 24, 2020 via email

@tunai
Copy link

tunai commented Apr 29, 2020

Hi,
First of all, thank you all for your efforts. I am trying to do this today and unfortunately I faced various problems.

An obvious one is the fact that my argument_spec.h file was not on "...Lib\site-packages\torch\include\torch\csrc\jit", but rather on "...Lib\site-packages\torch\include\torch\csrc\jit\runtime". Maybe that's the problem?

Other two things: when using "conda install pytorch torchvision cudatoolkit=10.2 -c pytorch" it now installs PyTorch 1.5 and torchvision 0.6, which might also be problematic. My CUDA toolkit is 10.2.

I am re-installing everything and trying to follow the versions mentioned. I will report back soon.

@tunai
Copy link

tunai commented Apr 29, 2020

Update: I was able to install it and run the demo.

A couple of considerations:

  1. When installing pytorch 1.3.1 with torchvision 0.4.2 via conda install pytorch==1.3.1 torchvision==0.4.2 -c pytorch I was able to find the argument_spec.h file in its correct place.

  2. I installed it with CUDA 10.2. So yeah, it also works with the latest CUDA.

  3. My Pillow package (7.0.0) crashed torchvision 0.4.1 (New Pillow version (7.0.0) breaks torchvision (ImportError: cannot import name 'PILLOW_VERSION' from 'PIL') pytorch/vision#1712). Fix: install an older version with pip install "pillow<7"

  4. I used Visual Studio 2019 community.

@apiszcz
Copy link
Author

apiszcz commented Apr 29, 2020

FYI:
I am running d2 v0.11 on Windows 10 and server 2019 with pytorch 1.4.0/cuda 10.1
Thanks for the note on 10.2, however it sounds like pytorch 1.5 may need some work.
Testing with pytorch 1.5 now, so far ok.

@tunai
Copy link

tunai commented Apr 29, 2020

That's very interesting! Did you change the content of argument_spec.h inside the "...\runtime" folder? That was the biggest difference I noticed from the usual path when I did it.

I also forgot to mention (and I do not know if that was necessary or not), but I changed the content of the ROIAlign_cuda.cu and ROIAlignRotated_cuda.cu files on the d2 folder as well.

@apiszcz
Copy link
Author

apiszcz commented Apr 29, 2020

no changes. Compiled inside of the Vis STudio x64 windows .

@phuhung273
Copy link

Successfully build and run official demo:

  • Window 10
  • VS17 Community. VS19 should work too because we only need VS Build tools 2017
  • Cuda 10.1
  • Torch 1.4

Note: this repo is currently on v0.1 (latest: v0.1.1) therefore cannot run the official demo.
Fix: Clone official repo in build detectron2 step
git clone https://github.com/facebookresearch/detectron2.git

@SkeletonOne
Copy link

@apiszcz Hi, have you solved the problem? I cannot find argument_spec.h either.

@veer5551
Copy link

Hi Team,
Trying to setup detectron2:
Pytorch -1.5
CUDA -10.1
Windows -10
VS - 2017

Getting the following error:

(pytorch_gpu2) PS C:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2> python setup
.py install
running install
running bdist_egg
running egg_info
creating detectron2.egg-info
writing detectron2.egg-info\PKG-INFO
writing dependency_links to detectron2.egg-info\dependency_links.txt
writing requirements to detectron2.egg-info\requires.txt
writing top-level names to detectron2.egg-info\top_level.txt
writing manifest file 'detectron2.egg-info\SOURCES.txt'
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\utils\cpp_extension.py:304: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'detectron2.egg-info\SOURCES.txt'
writing manifest file 'detectron2.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
building 'detectron2._C' extension
C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.11.25503\bin\HostX64\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DWITH_CUDA -IC:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\TH -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\include -IC:\Python37\include -IC:\Python37\include "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" -IC:\PROGRA~1\IBM\SQLLIB\INCLUDE -IC:\PROGRA~1\IBM\SQLLIB\LIB /EHsc /TpC:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc\vision.cpp /Fobuild\temp.win-amd64-3.7\Release\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc\vision.obj /MD /wd4819 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
vision.cpp
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\torch/extension.h(4): fatal error C1083: Cannot open include file: 'torch/all.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Professional\\VC\\Tools\\MSVC\\14.11.25503\\bin\\HostX64\\x64\\cl.exe' failed with exit status 2

Hey @apiszcz ,
Could you help here to build it on Pytorch 1.5 with VS2017.

Thanks a lot!

@apiszcz
Copy link
Author

apiszcz commented May 29, 2020

You need to be in the x64 Native Tools shell window
Path will be different for your setup

"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat"
cd detectron2
python setup.py build develop
python setup.py clean --all install clean --all

@veer5551
Copy link

Hey @apiszcz
ran the commands in native x64 shell.

Gave the following error:

(pytorch_gpu2) C:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2>python setup.py build develop
running build
running build_py
running build_ext
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\utils\cpp_extension.py:304: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\utils\cpp_extension.py:237: UserWarning: Error checking compiler version for cl: [WinError 740] The requested operation requires elevation
  warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
building 'detectron2._C' extension
C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.11.25503\bin\HostX64\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DWITH_CUDA -IC:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\TH -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\include -IC:\Python37\include -IC:\Python37\include "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" -IC:\PROGRA~1\IBM\SQLLIB\INCLUDE -IC:\PROGRA~1\IBM\SQLLIB\LIB /EHsc /TpC:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc\vision.cpp /Fobuild\temp.win-amd64-3.7\Release\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc\vision.obj /MD /wd4819 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Professional\\VC\\Tools\\MSVC\\14.11.25503\\bin\\HostX64\\x64\\cl.exe' failed: Invalid argument

I have checked the "run as admin" in cl.exe properties too.
Same error! Please have a look into it

Thanks a lot!

@apiszcz
Copy link
Author

apiszcz commented May 29, 2020

That should work, did you try VS Community 2019?
There is a sequence to installing everything

  1. VS
  2. NVIDIA
  3. CUDA

if you don't install in that order you will have problems.

@veer5551
Copy link

I had VS2017 installed previously.
Then I installed CUDA 10.1
Then as mentioned in the readme, I updated the VS2017 with 14.11 toolset.
That might be causing the issues?

If yes, do I need to do clean installs for all again?

Thanks a lot!

@apiszcz
Copy link
Author

apiszcz commented May 29, 2020 via email

@veer5551
Copy link

Hey @apiszcz ,
Thanks a lot!
Will try to have a clean installation of all again! Will report back!

Thanks a lot once again!

@veer5551
Copy link

Hey @apiszcz ,

Pytorch -1.5
CUDA - 10.1

Running with the following error now!

(pytorch_gpu2) C:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2>python setup.py build develop
running build
running build_py
running build_ext
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\utils\cpp_extension.py:304: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
building 'detectron2._C' extension
C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.11.25503\bin\HostX64\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DWITH_CUDA -IC:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\TH -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -IC:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\include -IC:\Python37\include -IC:\Python37\include "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" -IC:\PROGRA~1\IBM\SQLLIB\INCLUDE -IC:\PROGRA~1\IBM\SQLLIB\LIB /EHsc /TpC:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc\vision.cpp /Fobuild\temp.win-amd64-3.7\Release\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\conansherry\detectron2\detectron2\layers\csrc\vision.obj /MD /wd4819 /EHsc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
vision.cpp
C:\Users\msjmf59\Documents\VirtualEnvironments\pytorch_gpu2\lib\site-packages\torch\include\torch\csrc\api\include\torch/cuda.h(5): fatal error C1083: Cannot open include file: 'cstddef': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Professional\\VC\\Tools\\MSVC\\14.11.25503\\bin\\HostX64\\x64\\cl.exe' failed with exit status 2

Running in native x64 shell
Could you please have a look?

Thanks a lot!

@apiszcz
Copy link
Author

apiszcz commented May 30, 2020

i thought you were going to try VS 2019 community, however no big deal.
Without logging in to see the the issue it appears you did not remove all prior CUDA and CUDNN packages. (my guess)
cstddef appears to be a CUDA header or CUDNN header

@veer5551
Copy link

Hey @apiszcz,

Installed VS2019 (hopefully, it will not affect other projects though! not checked)
However, now I am facing the same issue as #12

log:

C:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\cns\detectron2\detectron2\layers\csrc\deformable\deform_conv.h(136): error: identifier "AT_CHECK" is undefined

C:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\cns\detectron2\detectron2\layers\csrc\deformable\deform_conv.h(184): error: identifier "AT_CHECK" is undefined

C:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\cns\detectron2\detectron2\layers\csrc\deformable\deform_conv.h(234): error: identifier "AT_CHECK" is undefined

C:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\cns\detectron2\detectron2\layers\csrc\deformable\deform_conv.h(284): error: identifier "AT_CHECK" is undefined

C:\Users\msjmf59\Documents\Projects\Auto_Labelling\Models\Pytorch\cns\detectron2\detectron2\layers\csrc\deformable\deform_conv.h(341): error: identifier "AT_CHECK" is undefined

C:/Users/msjmf59/Documents/Projects/Auto_Labelling/Models/Pytorch/cns/detectron2/detectron2/layers/csrc/deformable/deform_conv_cuda.cu(155): error: identifier "AT_CHECK" is undefined

C:/Users/msjmf59/Documents/Projects/Auto_Labelling/Models/Pytorch/cns/detectron2/detectron2/layers/csrc/deformable/deform_conv_cuda.cu(338): error: identifier "AT_CHECK" is undefined

C:/Users/msjmf59/Documents/Projects/Auto_Labelling/Models/Pytorch/cns/detectron2/detectron2/layers/csrc/deformable/deform_conv_cuda.cu(503): error: identifier "AT_CHECK" is undefined

C:/Users/msjmf59/Documents/Projects/Auto_Labelling/Models/Pytorch/cns/detectron2/detectron2/layers/csrc/deformable/deform_conv_cuda.cu(696): error: identifier "AT_CHECK" is undefined

C:/Users/msjmf59/Documents/Projects/Auto_Labelling/Models/Pytorch/cns/detectron2/detectron2/layers/csrc/deformable/deform_conv_cuda.cu(823): error: identifier "AT_CHECK" is undefined

C:/Users/msjmf59/Documents/Projects/Auto_Labelling/Models/Pytorch/cns/detectron2/detectron2/layers/csrc/deformable/deform_conv_cuda.cu(953): error: identifier "AT_CHECK" is undefined

11 errors detected in the compilation of "C:/Users/msjmf59/AppData/Local/Temp/tmpxft_00000a80_00000000-10_deform_conv_cuda.cpp1.ii".
deform_conv_cuda.cu
ninja: build stopped: subcommand failed.

sys info:

sys.platform           win32
Python                 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]
Numpy                  1.18.4
detectron2._C          failed to import
DETECTRON2_ENV_MODULE  <not set>
PyTorch                1.5.0+cu101
PyTorch Debug Build    False
torchvision            0.6.0+cu101
CUDA available         True
GPU 0                  Quadro P1000
CUDA_HOME              C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
NVCC                   Not Available
Pillow                 7.1.2
---------------------  ----------------------------------------------------------------------------
PyTorch built with:
  - C++ Version: 199711
  - MSVC 191627039
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191125 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 200203
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.4
  - Magma 2.5.2

Hopefully, am I getting closer to building it successfully?

Thanks a lot for your help!
Really Appreciate it!

@apiszcz
Copy link
Author

apiszcz commented May 31, 2020 via email

@veer5551
Copy link

Yes, I did that too!
I had built the pipeline for Tensorflow too previously, with the same CUDA versions.
Need to check if it is working fine after the reinstallation of CUDA and cudNN.
Hopefully, that should be working fine, else there would a lot of mess again!

Thanks!

@apiszcz
Copy link
Author

apiszcz commented May 31, 2020 via email

@veer5551
Copy link

Hey @apiszcz,
Thanks! That's what I was looking for!

Built the detectron2 successfully and ran the demo! Works fine!!
Now just need to check if other pipelines are working fine though :)

Thanks a lot for your help!!
Really appreciate it!

@17sarf
Copy link

17sarf commented May 31, 2020

Thanks to all the comments/suggestion made here, I was able to install detectron2 (without too much hassle).

PyTorch version: 1.4.0
Torchvision: 0.5.0

For some reason I can't seem to download the torchvision 0.4.x using with the pip or conda commands.

Running the nvidia-smi command, it says that I am running CUDA Version: 11.0, which I am assuming is 10.1, because that is what I attempted to install.

However, I am having have the following error when I attempt to train the model:

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("kaist_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.001  # pick a good LR
cfg.SOLVER.MAX_ITER = 3000    # 300 iterations seems good enough for this toy dataset; you may need to train longer for a practical dataset
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (ballon)
cfg.MODEL.MASK_ON = False
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()
[05/31 23:44:11 d2.engine.train_loop]: Starting training from iteration 0
ERROR [05/31 23:44:16 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "c:\users\sarfraz\detectron2\detectron2\engine\train_loop.py", line 132, in train
    self.run_step()
  File "c:\users\sarfraz\detectron2\detectron2\engine\train_loop.py", line 215, in run_step
    loss_dict = self.model(data)
  File "C:\Users\sarfraz\miniconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "c:\users\sarfraz\detectron2\detectron2\modeling\meta_arch\rcnn.py", line 117, in forward
    proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
  File "C:\Users\sarfraz\miniconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "c:\users\sarfraz\detectron2\detectron2\modeling\proposal_generator\rpn.py", line 363, in forward
    proposals = self.predict_proposals(
  File "C:\Users\sarfraz\miniconda3\envs\torch_env\lib\site-packages\torch\autograd\grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "c:\users\sarfraz\detectron2\detectron2\modeling\proposal_generator\rpn.py", line 389, in predict_proposals
    return find_top_rpn_proposals(
  File "c:\users\sarfraz\detectron2\detectron2\modeling\proposal_generator\proposal_utils.py", line 109, in find_top_rpn_proposals
    keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)
  File "c:\users\sarfraz\detectron2\detectron2\layers\nms.py", line 19, in batched_nms
    return box_ops.batched_nms(boxes, scores, idxs, iou_threshold)
  File "C:\Users\sarfraz\miniconda3\envs\torch_env\lib\site-packages\torchvision\ops\boxes.py", line 76, in batched_nms
    keep = nms(boxes_for_nms, scores, iou_threshold)
  File "C:\Users\sarfraz\miniconda3\envs\torch_env\lib\site-packages\torchvision\ops\boxes.py", line 36, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
  File "C:\Users\sarfraz\miniconda3\envs\torch_env\lib\site-packages\torch\_ops.py", line 61, in __getattr__
    op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator torchvision::nms
[05/31 23:44:16 d2.engine.hooks]: Total training time: 0:00:04 (0:00:00 on hooks)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-fd2bd846076f> in <module>
     18 trainer = DefaultTrainer(cfg)
     19 trainer.resume_or_load(resume=False)
---> 20 trainer.train()

c:\users\sarfraz\detectron2\detectron2\engine\defaults.py in train(self)
    400             OrderedDict of results, if evaluation is enabled. Otherwise None.
    401         """
--> 402         super().train(self.start_iter, self.max_iter)
    403         if len(self.cfg.TEST.EXPECTED_RESULTS) and comm.is_main_process():
    404             assert hasattr(

c:\users\sarfraz\detectron2\detectron2\engine\train_loop.py in train(self, start_iter, max_iter)
    130                 for self.iter in range(start_iter, max_iter):
    131                     self.before_step()
--> 132                     self.run_step()
    133                     self.after_step()
    134             except Exception:

c:\users\sarfraz\detectron2\detectron2\engine\train_loop.py in run_step(self)
    213         If you want to do something with the losses, you can wrap the model.
    214         """
--> 215         loss_dict = self.model(data)
    216         losses = sum(loss_dict.values())
    217         self._detect_anomaly(losses, loss_dict)

~\miniconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

c:\users\sarfraz\detectron2\detectron2\modeling\meta_arch\rcnn.py in forward(self, batched_inputs)
    115 
    116         if self.proposal_generator:
--> 117             proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)
    118         else:
    119             assert "proposals" in batched_inputs[0]

~\miniconda3\envs\torch_env\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

c:\users\sarfraz\detectron2\detectron2\modeling\proposal_generator\rpn.py in forward(self, images, features, gt_instances)
    361             losses = {}
    362 
--> 363         proposals = self.predict_proposals(
    364             anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
    365         )

~\miniconda3\envs\torch_env\lib\site-packages\torch\autograd\grad_mode.py in decorate_no_grad(*args, **kwargs)
     47         def decorate_no_grad(*args, **kwargs):
     48             with self:
---> 49                 return func(*args, **kwargs)
     50         return decorate_no_grad
     51 

c:\users\sarfraz\detectron2\detectron2\modeling\proposal_generator\rpn.py in predict_proposals(self, anchors, pred_objectness_logits, pred_anchor_deltas, image_sizes)
    387         # are also network responses, so is approximate.
    388         pred_proposals = self._decode_proposals(anchors, pred_anchor_deltas)
--> 389         return find_top_rpn_proposals(
    390             pred_proposals,
    391             pred_objectness_logits,

c:\users\sarfraz\detectron2\detectron2\modeling\proposal_generator\proposal_utils.py in find_top_rpn_proposals(proposals, pred_objectness_logits, image_sizes, nms_thresh, pre_nms_topk, post_nms_topk, min_box_side_len, training)
    107             boxes, scores_per_img, lvl = boxes[keep], scores_per_img[keep], lvl[keep]
    108 
--> 109         keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)
    110         # In Detectron1, there was different behavior during training vs. testing.
    111         # (https://github.com/facebookresearch/Detectron/issues/459)

c:\users\sarfraz\detectron2\detectron2\layers\nms.py in batched_nms(boxes, scores, idxs, iou_threshold)
     17     # Investigate after having a fully-cuda NMS op.
     18     if len(boxes) < 40000:
---> 19         return box_ops.batched_nms(boxes, scores, idxs, iou_threshold)
     20 
     21     result_mask = scores.new_zeros(scores.size(), dtype=torch.bool)

~\miniconda3\envs\torch_env\lib\site-packages\torchvision\ops\boxes.py in batched_nms(boxes, scores, idxs, iou_threshold)
     74     offsets = idxs.to(boxes) * (max_coordinate + 1)
     75     boxes_for_nms = boxes + offsets[:, None]
---> 76     keep = nms(boxes_for_nms, scores, iou_threshold)
     77     return keep
     78 

~\miniconda3\envs\torch_env\lib\site-packages\torchvision\ops\boxes.py in nms(boxes, scores, iou_threshold)
     34         by NMS, sorted in decreasing order of scores
     35     """
---> 36     return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
     37 
     38 

~\miniconda3\envs\torch_env\lib\site-packages\torch\_ops.py in __getattr__(self, op_name)
     59         # for overloads and raise an exception if there are more than one.
     60         qualified_op_name = '{}::{}'.format(self.name, op_name)
---> 61         op = torch._C._jit_get_operation(qualified_op_name)
     62         # let the script frontend know that op is identical to the builtin op
     63         # with qualified_op_name

RuntimeError: No such operator torchvision::nms

@veer5551
Copy link

veer5551 commented Jun 2, 2020

Hey @17sarf,

For the first part for downloading torchvision 0.4.x,
you can download the whl file corresponding to your CUDA version from here:
https://download.pytorch.org/whl/torch_stable.html.

I can't help you much onto the error part though since I am a newbie to this domain!

Thanks!

@17sarf
Copy link

17sarf commented Jun 2, 2020

@veer5551 thanks for the suggestion but I managed to download it by creating a conda python 3.6-based environment. I'm not sure if that was issue, but it all seems to be working now. I have pytorch=1.4 and torchvision=0.5.0 and detectron=1.3 installed.

@solarflarefx
Copy link

solarflarefx commented Jun 8, 2020

@17sarf When you say you installed detectron=1.3 are you referring to the latest release: https://github.com/facebookresearch/detectron2/releases

It looks to me that the code in this particular repository is for v0.1. Did you instead use the official repository and build from source?

@veer5551 did you get your install working? What is your final setup?
OS: Windows 10 64-bit
CUDA: 10.1
Pytorch: 1.5
Detectron2: 0.1.3
Which version of torchvision are you using?

@17sarf
Copy link

17sarf commented Jun 8, 2020

@solarflarefx yes I used official repository.

@solarflarefx
Copy link

solarflarefx commented Jun 9, 2020

@17sarf Got it thanks. Another question -- which command prompt are you using to build detectron2? Developer Command Prompt for VS 2019? Anaconda prompt?

5 errors detected in the compilation of "C:/Users/Windows/AppData/Local/Temp/tmpxft_00002f20_00000000-10_nms_rotated_cuda.cpp1.ii".
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc.exe' failed with exit status 1

I did change the two PyTorch files cast.h and argument_spec.h

Did you have to make any changes to the .cu files?

My environment:
OS: Windows 10 x64
Python: 3.6
Pytorch: 1.4
Torchvision: 0.5.0
CUDA: 10.1
cuDNN: 7.6.5
Detectron2: 0.1.3

@veer5551
Copy link

veer5551 commented Jun 10, 2020

Hey @solarflarefx,

Yes, My Installation was successful and detectron2 is working on my machine.
I used the native x64 (VS 2019) shell to build the detectron2.

No changes to any files. Used the official code base from Facebook for detectron2
as mentioned here #2 (comment)

Here is the system info :
OS - Windows 10
VS - 2019
Python- 3.7.1
Pytorch - 1.5
torchvision- 0.6
CUDA -10.1
cuDNN 7.6.4 for CUDA 10.1

@solarflarefx
Copy link

@veer5551 Thanks for your reply. Any idea what could be causing the nvcc.exe error I am getting above?

@17sarf
Copy link

17sarf commented Jun 10, 2020

@solarflarefx I used the Anaconda prompt to install detectron2. I hope you got it working now.

@veer5551
Copy link

@solarflarefx ,
I can't help much with the error part since I am a newbie user!
You are using the official codebase for detectron2 from Facebook, right?

Hope you got it working!!

@junyango
Copy link

junyango commented Jun 29, 2020

Hey @solarflarefx,

Yes, My Installation was successful and detectron2 is working on my machine.
I used the native x64 (VS 2019) shell to build the detectron2.

No changes to any files. Used the official code base from Facebook for detectron2
as mentioned here #2 (comment)

Here is the system info :
OS - Windows 10
VS - 2019
Python- 3.7.1
Pytorch - 1.5
torchvision- 0.6
CUDA -10.1
cuDNN 7.6.4 for CUDA 10.1

Hey. By using the official repo, wouldn't there be some issues building pycocotools? The latest repo requires pycocootools>=2.0.1 but the current version of pycocotool builds work for v2.0. I faced an error of an invalid identifier, localtime_r in detectron2/detectron2/layers/csrc/cocoeval/cocoeval.cpp, line 389.

My system info is:
Python 3.7.7
Pytorch- 1.51 for CUDA 101
Torchvision 0.6.1 for CUDA 101
Cuda - 10.1
cuDNN 7.6.5 for CUDA 10.1

image

  • I was able to build detectron0.1 using this repo with the given settings as mentioned in the repo. However, just attempting to build the latest version from facebookresearch's repo

@yinghuang
Copy link

In detectron2\detectron2\layers\csrc\cocoeval\cocoeval.cpp
1. add #include <time.h>
2.change localtime_r(&rawtime, &local_time);
to
localtime_s(&local_time, &rawtime);

@junyango
Copy link

junyango commented Jul 8, 2020

In detectron2\detectron2\layers\csrc\cocoeval\cocoeval.cpp

  1. add #include <time.h>
    2.change localtime_r(&rawtime, &local_time);
    to
    localtime_s(&local_time, &rawtime);

Thanks for the reply, that worked! But now i face the problem of 5 errors detected in the compilation of "C:/Users/user/AppData/Local/Temp/tmpxft_00004348_00000000-10_nms_rotated_cuda.cpp1.ii".

@SutarZeev
Copy link

In detectron2\detectron2\layers\csrc\cocoeval\cocoeval.cpp

  1. add #include <time.h>
    2.change localtime_r(&rawtime, &local_time);
    to
    localtime_s(&local_time, &rawtime);

Thanks for the reply, that worked! But now i face the problem of 5 errors detected in the compilation of "C:/Users/user/AppData/Local/Temp/tmpxft_00004348_00000000-10_nms_rotated_cuda.cpp1.ii".

the acctual error is:
detectron2/detectron2/layers/csrc/nms_rotated/nms_rotated_cuda.cu(14): error: name must be a namespace name

this is caused by "box_iou_rotated_utils.h" not being included.
A quick & dirty fix would be to just add:

#define WITH_HIP

before Line 11:

#ifdef WITH_HIP

in the detectron2\detectron2\layers\csrc\nms_rotated\nms_rotated_cuda.cu file.

Note:
The build actually also worked for me by doing the define for "WITH_CUDA" instead of "WITH_HIP". But on my system I got a RuntimeError when I actually tried to use detectron with that:

CUDA error: the launch timed out and was terminated

@shihlun1208
Copy link

shihlun1208 commented Aug 27, 2020

I have Windows 10, torch 1.4, torchvision 0.5.0, cuda10, cudnn7.6.4, python 3.6
doing it with x64 native command prompt still got the error, any suggestions?
image

I'm wondering if compile version cause this problem, but have still not found the solution.
image
#-------------------------------------------------------
I have upgraded build tools from 2015 to 2017 and compiled successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests