Cannot load scripted Keypoint-RCNN model in C++ #2736

lysukhin · 2020-10-01T11:40:34Z

🐛 Bug

Hello!

I am trying to load a Keypoint-RCNN model into C++ via TorchScript scripting.
Scripting in python itself works ok (with no errors), however loading to C++ throws an error:

terminate called after throwing an instance of 'torch::jit::ErrorReport'                                                                                                                                   [36/1871]
  what():
Unknown builtin op: torchvision::nms.
Could not find any similar ops to torchvision::nms. This op may not exist or may not be currently supported in TorchScript.
:
  File "/home/d.lysukhin/distr/anaconda3/envs/nightly/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 39
        by NMS, sorted in decreasing order of scores
    """
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 91
    scores: Tensor,
    iou_threshold: float) -> Tensor:
  _42 = ops.torchvision.nms(boxes, scores, iou_threshold)
        ~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _42
'nms' is being compiled since it was called from 'batched_nms'
  File "/home/d.lysukhin/distr/anaconda3/envs/nightly/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 85
        offsets = idxs.to(boxes) * (max_coordinate + torch.tensor(1).to(boxes))
        boxes_for_nms = boxes + offsets[:, None]
        keep = nms(boxes_for_nms, scores, iou_threshold)
               ~~~ <--- HERE
        return keep
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 50
_18 = torch.slice(offsets, 0, 0, 9223372036854775807, 1)                                                                                                                                               [12/1871]
    boxes_for_nms = torch.add(boxes, torch.unsqueeze(_18, 1), alpha=1)
    keep = __torch__.torchvision.ops.boxes.nms(boxes_for_nms, scores, iou_threshold, )
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _11 = keep
  return _11
'batched_nms' is being compiled since it was called from 'RegionProposalNetwork.filter_proposals'
Serialized   File "code/__torch__/torchvision/models/detection/rpn.py", line 64
    _11 = __torch__.torchvision.ops.boxes.clip_boxes_to_image
    _12 = __torch__.torchvision.ops.boxes.remove_small_boxes
    _13 = __torch__.torchvision.ops.boxes.batched_nms
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    num_images = (torch.size(proposals))[0]
    device = ops.prim.device(proposals)
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
  File "/home/d.lysukhin/distr/anaconda3/envs/nightly/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 493
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
        proposals = proposals.view(num_images, -1, 4)
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
                        ~~~~~~~~~~~~~~~~~~~~~ <--- HERE

        losses = {}
Serialized   File "code/__torch__/torchvision/models/detection/rpn.py", line 37
    proposals = (self.box_coder).decode(torch.detach(pred_bbox_deltas0), anchors, )
    proposals0 = torch.view(proposals, [num_images, -1, 4])
    _8 = (self).filter_proposals(proposals0, objectness0, images.image_sizes, num_anchors_per_level, )
                                                                              ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    boxes, scores, = _8
    losses = annotate(Dict[str, Tensor], {})

Aborted

And yes, I am using nightly builds (torch ver. 1.7.0.dev20200929, torchvision ver. 0.8.0.dev20200929) (as offered here: https://discuss.pytorch.org/t/torchvision-ops-nms-in-torchscript/89286/2).

To Reproduce

Steps to reproduce the behavior:

Load and run scripting of torchvision kprcnn implementation:

import torch
import torchvision
from torchvision.models.detection.keypoint_rcnn import keypointrcnn_resnet50_fpn

print("torch ver.", torch.__version__) # yields 'torch ver. 1.7.0.dev20200929'
print("torchvision ver.", torchvision.__version__) # yields 'torchvision ver. 0.8.0.dev20200929'

def main():
    model = keypointrcnn_resnet50_fpn()
    scripted_module = torch.jit.script(model)
    
    with open("kprcnn.pt", "wb") as fp:
        torch.jit.save(scripted_module, fp)
        
if __name__ == "__main__":
    main()

Write C++ example (from here as well: https://pytorch.org/tutorials/advanced/cpp_export.html):

#include <torch/script.h> // One-stop header.

#include <iostream>
#include <memory>

int main(int argc, const char* argv[]) {
  if (argc != 2) {
    std::cerr << "usage: example-app <path-to-exported-script-module>\n";
    return -1;
  }


  torch::jit::script::Module module;
  try {
    // Deserialize the ScriptModule from a file using torch::jit::load().
    module = torch::jit::load(argv[1]);
  }
  catch (const c10::Error& e) {
    std::cerr << "error loading the model\n";
    return -1;
  }

  std::cout << "ok\n";
}

Write CMakeLists.txt as proposed in https://pytorch.org/tutorials/advanced/cpp_export.html:

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(torchscript-example)

find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_executable(torchscript-example torchscript-example.cpp)
target_link_libraries(torchscript-example "${TORCH_LIBRARIES}")
set_property(TARGET torchscript-example PROPERTY CXX_STANDARD 14)

Build

mkdir build; cd build; cmake -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` ..; cmake --build . --config Release

Run
./torchscript-example ../kpcnn.pt

Environment

PyTorch / torchvision Version (e.g., 1.0 / 0.4.0): 1.7.0.dev20200929 / 0.8.0.dev20200929
OS (e.g., Linux): Ubuntu 18.04
How you installed PyTorch / torchvision (conda, pip, source): conda
Build command you used (if compiling from source):
Python version: 3.8.5
CUDA/cuDNN version: 10.2 / 8.0.4.30
GPU models and configuration: 2080 Ti
Any other relevant information:

Additional context

Same behaviour is met at stable release.
Changing model to torchvision.models.resnet50() work just fine.

The text was updated successfully, but these errors were encountered:

vfdev-5 · 2020-10-01T11:46:41Z

Thanks for the report! Recently, faster rcnn resnet50 tracing test was added: https://github.com/pytorch/vision/tree/master/test/tracing/frcnn

Could you please check if it works for you with faster rcnn like in the test and if it fails with keypoint rcnn ?

cc @andfoy

lysukhin · 2020-10-01T11:49:39Z

@vfdev-5 Hello!
Tried to change model to torchvision.models.detection.fasterrcnn_resnet50_fpn as you proposed, ~~and it worked just fine (no errors thrown).~~ got the error again

fmassa · 2020-10-01T12:26:35Z

@lysukhin Can you try compiling the example in https://github.com/pytorch/vision/tree/master/test/tracing/frcnn (using the CMakeLists from there) and report back?
We run that test on CI, so this is working on our side.

Also, looks like the example had to include RoIAlign and NMS as well, could you try that as well?

vision/test/tracing/frcnn/test_frcnn_tracing.cpp

Lines 4 to 6 in e70c91a

    
           #include <torchvision/ROIAlign.h> 
        
           #include <torchvision/cpu/vision_cpu.h> 
        
           #include <torchvision/nms.h>

lysukhin · 2020-10-01T14:23:41Z

@fmassa Sure, but I'm not able to locate TorchVisionConfig.cmake that seems to be required for compiling. Does it come with conda installation?

vfdev-5 · 2020-10-01T14:30:15Z

@lysukhin I think you have to build torchvision C++ API as here : https://github.com/pytorch/vision#c-api

lysukhin · 2020-10-01T16:53:26Z

@vfdev-5 @fmassa Well, I was able to build torchvision C++ API from source (after a couple of rows of re-installing cmake and so).

But using provided code for CMakeLists.txt and so from here as you suggested (https://github.com/pytorch/vision/tree/master/test/tracing/frcnn) still results in a Unknown builtin op: torchvision::nms error :(

There might be something that I'm missing here!

fmassa · 2020-10-01T18:19:24Z

@lysukhin here is what our CI does:

vision/.circleci/config.yml.in

Lines 603 to 606 in a98e17e

    
           - run: 
        
               name: Setup conda 
        
               command: .circleci/unittest/linux/scripts/setup_env.sh 
        
           - run: packaging/build_cmake.sh

The env setup is in this file, which basically setups conda and install the dependencies.

The rest of the CI exports include paths and generates the CMake files, and then the execution is done in

vision/packaging/build_cmake.sh

Lines 60 to 83 in a98e17e

    
           # Install torchvision locally 
        
           python setup.py develop 
        
           # Trace, compile and run project that uses Faster-RCNN 
        
           pushd test/tracing/frcnn 
        
           mkdir build 
        
           # Trace model 
        
           python trace_model.py 
        
           cp fasterrcnn_resnet50_fpn.pt build 
        
           cd build 
        
           cmake .. -DTorch_DIR=$TORCH_PATH/share/cmake/Torch -DWITH_CUDA=$CMAKE_USE_CUDA 
        
           if [[ "$OSTYPE" == "msys" ]]; then 
        
               "$script_dir/windows/internal/vc_env_helper.bat" "$script_dir/windows/internal/build_frcnn.bat" 
        
               mv fasterrcnn_resnet50_fpn.pt Release 
        
               cd Release 
        
               export PATH=$(cygpath "C:/Program Files (x86)/torchvision/bin"):$(cygpath $TORCH_PATH)/lib:$PATH 
        
           else 
        
               make 
        
           fi 
        
           # Run traced program 
        
           ./test_frcnn_tracing

Please let us know what parts have been missing for you, so that we can improve our examples / tutorials on how to get this working.

cc @andfoy

andfoy · 2020-10-01T19:11:18Z

@lysukhin, don't forget also to install libtorchvision via make install, otherwise you'll get the TorchVisionConfig.cmake error

vision/packaging/build_cmake.sh

Line 48 in a98e17e

make install

fmassa · 2020-10-01T19:25:26Z

@andfoy might be good to better document how to get Faster R-CNN to run on torchscript C++. Maybe an example or tutorial would be helpful to consider?

andfoy · 2020-10-01T19:39:23Z

Maybe an example or tutorial would be helpful to consider?

We should definitely add a tutorial so that it is more easier for users to use

lysukhin · 2020-10-02T12:12:47Z

@fmassa
Thanks for your advice. I've tried to follow the scripts that you provided. Now I come up with the following sequence (NB: I removed my previous conda installation to exclude option of some crazy under-the-hood envs interference):

# Getting conda
wget https://repo.anaconda.com/archive/Anaconda3-2020.07-Linux-x86_64.sh
bash Anaconda3-2020.07-Linux-x86_64.sh

# Getting torchvision
git clone https://github.com/pytorch/vision.git
cd vision

# Getting deps
conda create --name nightly
conda activate nightly
conda env update --file .circleci/unittest/linux/scripts/environment.yml --prune

conda install -y pytorch cudatoolkit=10.2 -c pytorch-nightly
conda install -yq libpng jpeg
TORCH_PATH=$(dirname $(python -c "import torch; print(torch.__file__)")); echo $TORCH_PATH

# Building TorchVision C++ API
mkdir cpp_build; cd cpp_build

cmake .. -DTorch_DIR=$TORCH_PATH/share/cmake/Torch -DWITH_CUDA=on -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
make -j 8
sudo make install

# Installing torchvision
cd ..
python setup.py develop

# Running R-CNN test
cd test/tracing/frcnn
mkdir build

python trace_model.py
cp fasterrcnn_resnet50_fpn.pt build; cd build

cmake .. -DTorch_DIR=$TORCH_PATH/share/cmake/Torch -DWITH_CUDA=on
make
./test_frcnn_tracing

But still I'm getting this annoying thing:

Loading model
Other error:
Unknown builtin op: torchvision::nms.
Could not find any similar ops to torchvision::nms. This op may not exist or may not be currently supported in TorchScript.
:
  File "/home/d.lysukhin/distr/vision/torchvision/ops/boxes.py", line 40
            by NMS, sorted in decreasing order of scores
        """
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 91
    scores: Tensor,
    iou_threshold: float) -> Tensor:
  _42 = ops.torchvision.nms(boxes, scores, iou_threshold)
        ~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _42
'nms' is being compiled since it was called from 'batched_nms'
  File "/home/d.lysukhin/distr/vision/torchvision/ops/boxes.py", line 86
        offsets = idxs.to(boxes) * (max_coordinate + torch.tensor(1).to(boxes))
        boxes_for_nms = boxes + offsets[:, None]
        keep = nms(boxes_for_nms, scores, iou_threshold)
               ~~~ <--- HERE
        return keep
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 50
    _18 = torch.slice(offsets, 0, 0, 9223372036854775807, 1)
    boxes_for_nms = torch.add(boxes, torch.unsqueeze(_18, 1), alpha=1)
    keep = __torch__.torchvision.ops.boxes.nms(boxes_for_nms, scores, iou_threshold, )
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _11 = keep
  return _11
'batched_nms' is being compiled since it was called from 'RegionProposalNetwork.filter_proposals'
Serialized   File "code/__torch__/torchvision/models/detection/rpn.py", line 64
    _11 = __torch__.torchvision.ops.boxes.clip_boxes_to_image
    _12 = __torch__.torchvision.ops.boxes.remove_small_boxes
    _13 = __torch__.torchvision.ops.boxes.batched_nms
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    num_images = (torch.size(proposals))[0]
    device = ops.prim.device(proposals)
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
  File "/home/d.lysukhin/distr/vision/torchvision/models/detection/rpn.py", line 494
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
        proposals = proposals.view(num_images, -1, 4)
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
                        ~~~~~~~~~~~~~~~~~~~~~ <--- HERE

        losses = {}
Serialized   File "code/__torch__/torchvision/models/detection/rpn.py", line 37
    proposals = (self.box_coder).decode(torch.detach(pred_bbox_deltas0), anchors, )
    proposals0 = torch.view(proposals, [num_images, -1, 4])
    _8 = (self).filter_proposals(proposals0, objectness0, images.image_sizes, num_anchors_per_level, )
                                                                              ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    boxes, scores, = _8
    losses = annotate(Dict[str, Tensor], {})

fmassa · 2020-10-02T12:36:52Z

@andfoy can you help @lysukhin solve this issue? We definitely need better instructions on how to get torchvision custom ops running in torchscript on C++

lysukhin · 2020-10-08T08:51:47Z

Any updates on this? @andfoy

zhiqwang · 2020-10-08T09:13:39Z

I changed this line

vision/test/tracing/frcnn/test_frcnn_tracing.cpp

Line 4 in 6756ed0

#include <torchvision/ROIAlign.h>

to

#include <torchvision/ROIPool.h>

can address this issue (just for faster rcnn model, I did not test the keypoint rcnn model). But as
@fmassa mentioned here #2679 (review), the faster rcnn model uses ROIAlign instead, it's weird. Maybe you can try this?

fmassa · 2020-10-30T09:11:53Z

I believe this has been fixed with #2798 which is available in torchvision from master (but not in the 0.8.1 release).

Please let us know if this still doesn't work for you even if you are using torchvision from master.

vfdev-5 added the topic: object detection label Oct 1, 2020

vfdev-5 added the module: c++ frontend label Oct 1, 2020

fmassa assigned andfoy Oct 2, 2020

bmanga mentioned this issue Oct 13, 2020

Port all C++ ops to use the dispatcher #2796

Closed

6 tasks

zhiqwang mentioned this issue Oct 28, 2020

Serious problem !!! C++ torchvision does not register nms in Torchscript #2915

Closed

fmassa closed this as completed Oct 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot load scripted Keypoint-RCNN model in C++ #2736

Cannot load scripted Keypoint-RCNN model in C++ #2736

lysukhin commented Oct 1, 2020 •

edited

Loading

vfdev-5 commented Oct 1, 2020

lysukhin commented Oct 1, 2020 •

edited

Loading

fmassa commented Oct 1, 2020

lysukhin commented Oct 1, 2020

vfdev-5 commented Oct 1, 2020

lysukhin commented Oct 1, 2020

fmassa commented Oct 1, 2020

andfoy commented Oct 1, 2020

fmassa commented Oct 1, 2020

andfoy commented Oct 1, 2020

lysukhin commented Oct 2, 2020 •

edited

Loading

fmassa commented Oct 2, 2020

lysukhin commented Oct 8, 2020

zhiqwang commented Oct 8, 2020 •

edited

Loading

fmassa commented Oct 30, 2020

Cannot load scripted Keypoint-RCNN model in C++ #2736

Cannot load scripted Keypoint-RCNN model in C++ #2736

Comments

lysukhin commented Oct 1, 2020 • edited Loading

🐛 Bug

To Reproduce

Environment

Additional context

vfdev-5 commented Oct 1, 2020

lysukhin commented Oct 1, 2020 • edited Loading

fmassa commented Oct 1, 2020

lysukhin commented Oct 1, 2020

vfdev-5 commented Oct 1, 2020

lysukhin commented Oct 1, 2020

fmassa commented Oct 1, 2020

andfoy commented Oct 1, 2020

fmassa commented Oct 1, 2020

andfoy commented Oct 1, 2020

lysukhin commented Oct 2, 2020 • edited Loading

fmassa commented Oct 2, 2020

lysukhin commented Oct 8, 2020

zhiqwang commented Oct 8, 2020 • edited Loading

fmassa commented Oct 30, 2020

lysukhin commented Oct 1, 2020 •

edited

Loading

lysukhin commented Oct 1, 2020 •

edited

Loading

lysukhin commented Oct 2, 2020 •

edited

Loading

zhiqwang commented Oct 8, 2020 •

edited

Loading