Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DONOTMERGE] deform_conv2d with headers #3051

Closed
wants to merge 12 commits into from

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Nov 26, 2020

Proof of concept for #2956

Builds upon #2998

  • Moves C++ code from header files to cpp files. Each cpp file now has a separate header file with "public" functions.
  • Remove unnecessary repeated includes.
  • Use a similar naming convention as in PyTorch for the kernel implementations in C++ and CUDA.
  • Remove the _param postfix from the variables (DeformConv2d specific detail).
  • Register the operators in their own files instead all of them in vision.cpp.

@datumbox datumbox force-pushed the refactoring/headers_ops branch 4 times, most recently from 77cbf7c to 868de6c Compare November 26, 2020 19:10
- Create header files for kernel implementation and remove definitions from vision_*.h files.
- Eliminate unnecessary headers and ensure all cpp include their headers.
@datumbox datumbox force-pushed the refactoring/headers_ops branch from 868de6c to 9026df0 Compare November 26, 2020 19:20
@datumbox datumbox force-pushed the refactoring/headers_ops branch from 76e7140 to f3f8469 Compare November 26, 2020 22:03
@datumbox datumbox requested a review from fmassa November 26, 2020 22:20
@datumbox datumbox force-pushed the refactoring/headers_ops branch from e1ecdcd to 6f87104 Compare November 27, 2020 16:53
@datumbox
Copy link
Contributor Author

datumbox commented Nov 27, 2020

Concerning registering the operators in the Operator files (instead in vision.cpp), I checked how PyTorch does this and noticed that they use the DECLARE_DISPATCH macro. Here is an example of how the operator is registered in various places of the implementation:

I'm not sure whether it's possible for us to use the above macro but I understand that we can't just move the TORCH_LIBRARY and TORCH_LIBRARY_IMPL declarations on multiple places. So it's worth checking out what alternatives we have.

@ezyang
Copy link
Contributor

ezyang commented Nov 30, 2020

DECLARE_DISPATCH serves a different function; it's for making sure we can dispatch to differently vectorized implementations (we build CPU kernels multiple times at different levels of vectorization, then dispatch to them depending on what CPU capability is supported at runtime). This functionality hasn't been integrated into the dispatcher proper so it has to be done separately.

I'm not sure whether it's possible for us to use the above macro but I understand that we can't just move the TORCH_LIBRARY and TORCH_LIBRARY_IMPL declarations on multiple places. So it's worth checking out what alternatives we have.

I'm not sure what problem you're referring to here. TORCH_LIBRARY_IMPL is explicitly designed to be usable from multiple files. The model is:

  1. Define schemas for all operators in torchvision in one file
  2. For each implementation (can be separate file), have a TORCH_LIBRARY_IMPL that registers it

You can see an example in aten/src/ATen/native/quantized/library.cpp

@datumbox datumbox force-pushed the refactoring/headers_ops branch from 6f87104 to f6782a0 Compare November 30, 2020 17:16
@datumbox datumbox force-pushed the refactoring/headers_ops branch from f6782a0 to da80ce1 Compare November 30, 2020 17:20
@datumbox
Copy link
Contributor Author

datumbox commented Nov 30, 2020

You can see an example in aten/src/ATen/native/quantized/library.cpp

@ezyang Thanks that's useful.

Based on the example, the registration of the forward/backward methods continues happening in a single file (library.cpp in your example) using macro TORCH_LIBRARY but the TORCH_LIBRARY_IMPL calls move in the place where the methods are defined.

In our case then, I understand that:

  • the TORCH_LIBRARY calls will remain in the csrc/vision.cpp file.
  • the CPU TORCH_LIBRARY_IMPL calls will go in the csrc/cpu/deform_conv2d_kernel.cpp file.
  • the CUDA TORCH_LIBRARY_IMPL calls will go in the csrc/cuda/deform_conv2d_kernel.cu file .
  • the Autocast/Autograd TORCH_LIBRARY_IMPL calls will go in the csrc/deform_conv2d.cpp file.

Edit: I implemented the above on df0e8a7 but unfortunately it fails on windows:

[10/14] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -Dtorchvision_EXPORTS -IC:\Users\circleci\project\torchvision\csrc -IC:\Users\circleci\project\env\lib\site-packages\torch\include -IC:\Users\circleci\project\env\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\circleci\project\env\lib\site-packages\torch\include\TH -IC:\Users\circleci\project\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -IC:\Users\circleci\project\env\include -IC:\Users\circleci\project\env\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c C:\Users\circleci\project\torchvision\csrc\cuda\deform_conv2d_kernel.cu -o C:\Users\circleci\project\build\temp.win-amd64-3.8\Release\Users\circleci\project\torchvision\csrc\cuda\deform_conv2d_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75
FAILED: C:/Users/circleci/project/build/temp.win-amd64-3.8/Release/Users/circleci/project/torchvision/csrc/cuda/deform_conv2d_kernel.obj 
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -DWITH_CUDA -Dtorchvision_EXPORTS -IC:\Users\circleci\project\torchvision\csrc -IC:\Users\circleci\project\env\lib\site-packages\torch\include -IC:\Users\circleci\project\env\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\circleci\project\env\lib\site-packages\torch\include\TH -IC:\Users\circleci\project\env\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include" -IC:\Users\circleci\project\env\include -IC:\Users\circleci\project\env\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c C:\Users\circleci\project\torchvision\csrc\cuda\deform_conv2d_kernel.cu -o C:\Users\circleci\project\build\temp.win-amd64-3.8\Release\Users\circleci\project\torchvision\csrc\cuda\deform_conv2d_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=sm_75
C:/Users/circleci/project/env/lib/site-packages/torch/include\c10/util/ThreadLocalDebugInfo.h(12): warning: modifier is ignored on an enum specifier

C:/Users/circleci/project/env/lib/site-packages/torch/include\ATen/record_function.h(19): warning: modifier is ignored on an enum specifier

C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/jit/ir/ir.h(1349): error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized

C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/j
it/serialization/pickler.h(211): error: member "torch::jit::Pickler::kBufferSize" may not be initialized

C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/jit/serialization/pickler.h(228): error: expression must have a constant value
C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/jit/serialization/pickler.h(228): note: the value of member "torch::jit::Pickler::kBufferSize"
(211): here cannot be used as a constant

C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/autograd/profiler_legacy.h(93): warning: modifier is ignored on an enum specifier

C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/autograd/profiler_legacy.h(356): warning: modifier is ignored on an enum specifier

C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/autograd/profiler_kineto.h(16): warning: modifier is ignored on an enum specifier

3 errors detected in the compilation of "C:/Users/circleci/AppData/Local/Temp/tmpxft_00000a54_00000000-7_deform_conv2d_kernel.cpp1.ii".
deform_conv2d_kernel.cu

All other platforms work fine (ignore macOS_py3.7 which currently has an unrelated issue). If you have any recommendation please let me know.

@datumbox datumbox force-pushed the refactoring/headers_ops branch 3 times, most recently from f3c5a2e to 085ab41 Compare November 30, 2020 19:17
@datumbox datumbox force-pushed the refactoring/headers_ops branch from 085ab41 to df0e8a7 Compare November 30, 2020 21:33
@ezyang
Copy link
Contributor

ezyang commented Dec 1, 2020

This looks like the library is causing more headers to be included, and the flags you are compiling under trip an error under the flags. To unblock, I'd find the flag controlling "C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/jit/ir/ir.h(1349): error: member "torch::jit::ProfileOptionalOp::Kind" may not be initialized
" and suppress this error in vision (it should be somewhere in pytorch cmake). The proper fix is to fix torch/csrc/jit/ir/ir.h to fix the warning.

@datumbox
Copy link
Contributor Author

datumbox commented Dec 1, 2020

@ezyang Thanks for the pointer. We decided to go ahead with the rest of the changes without including the registration move but I'll do the investigation once the main refactoring is completed.

@datumbox datumbox closed this Dec 2, 2020
@datumbox
Copy link
Contributor Author

datumbox commented Dec 2, 2020

@ezyang

I traced the original problem back to the use of constexpr instead of const that causes issues under windows. There are two problematic files in PyTorch: ir.h and pickler.h. This issue has appeared before and was patched in different places:

I think the permanent fix is do replace constexpr with CONSTEXPR_EXCEPT_WIN_CUDA similar to this PR. I'll try sending a PR to PyTorch later this week but since this is a blocker for me, I wanted to explore your other suggestion of using flags on TorchVision's cmake conf. I'm not sure which flag is responsible for turning off the member ”X” may not be initialized warning. I've seen some solutions in an other PR but I'm not certain if this is advised.

If you have any pointers will be highly appreciated.

@ezyang
Copy link
Contributor

ezyang commented Dec 2, 2020

I was hoping the workaround would be simple, guess it's not obvious. Just send in the upstream fix and ping on work chat, I can expedite its review.

@datumbox
Copy link
Contributor Author

datumbox commented Dec 4, 2020

It seems that the issue with pickler.h is resolved by pytorch/pytorch#48717 but the ir.h requires additional work. We now fail with:

C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/jit/ir/ir.h(1329): error: a member of type "const c10::Symbol" cannot have an in-class initializer

C:/Users/circleci/project/env/lib/site-packages/torch/include\torch/csrc/jit/ir/ir.h(1349): error: a member of type "const c10::Symbol" cannot have an in-class initializer

I'm going to try to move the initializer outside of the class on a separate PR but I would really appreciate some input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants