Extract features and bounding boxes predictions in a few lines of Python code.
This repo is a cleaned version of [airsplay] (https://github.com/airsplay/py-bottom-up-attention) repo. For more details refer to that repo.
The detectron2 system with exactly the same model and weight as the Caffe VG Faster R-CNN provided in bottom-up-attetion.
git clone https://github.com/michelecafagna26/faster-rcnn-bottom-up-py.git
cd faster-rcnn-bottom-up-py
# Install python libraries
pip install -r requirements.txt
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
# Install detectron2
python setup.py build develop
# or if you are on macOS
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop
# or, as an alternative to `setup.py`, do
# pip install [--editable] .
from wrappers import FasterRCNNBottomUp
import cv2
IMG_FILE = "data/images/COCO_train2014_000000084002.jpg"
cfg_file = "configs/VG-Detection/faster_rcnn_R_101_C4_caffemaxpool_wrapper.yaml"
vg_objects = "data/genome/1600-400-20/objects_vocab.txt"
im = cv2.imread(IMG_FILE)
model = FasterRCNNBottomUp(cfg_file, object_txt = vg_objects, MAX_BOXES=150, MIN_BOXES=150)
instances, boxes = model([im], return_features=True)
To access the predicted object class:
class_id = instances[0].pred_classes
model.classes['thing_classes'][class_id]
- The default weight is same to the 'alternative pretrained model' in the original github here, which is trained with 36 bbxes. If you want to use the original detetion trained with 10~100 bbxes, please use the following weight:
http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr_original.pkl
- The coordinate generated from the code is (x_left_corner, y_top_corner, x_right_corner, y_bottom_corner). Here is a visualization. Suppose the
box = [x0, y0, x1, y1]
, it annotates an RoI of:0------------------------------------- | | y0 box[1] |-----------| | | | | | | | Object | | y1 box[3] |-----------| | | | H----------x0 box[0]-----x1 box[2]---- 0 W
- If the link breaks, please use this Google Drive: https://drive.google.com/drive/folders/1ICBed8F9JaayAshptGMiGtRj78esg3m4?usp=sharing.
- The orignal CAFFE implementation https://github.com/peteanderson80/bottom-up-attention, and its docker image.
- bottom-up-attention.pytorch maintained by MIL-LAB.
- original repo airsplay
- The Caffe2PyTorch conversion code (not released here) is based on Ruotian Luo's PyTorch-ResNet project.
- The project also refers to Ross Girshick's old py-faster-rcnn on its way.
Detectron2:
@misc{wu2019detectron2,
author = {Yuxin Wu and Alexander Kirillov and Francisco Massa and
Wan-Yen Lo and Ross Girshick},
title = {Detectron2},
howpublished = {\url{https://github.com/facebookresearch/detectron2}},
year = {2019}
}
Bottom-up Attention:
@inproceedings{Anderson2017up-down,
author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
booktitle={CVPR},
year = {2018}
}