Skip to content

Commit

Permalink
add abstract and sketch to the CONFIGS/algorithm/README.md (open-mmla…
Browse files Browse the repository at this point in the history
…b#6654)

* Fix mosaic repr typo (open-mmlab#6523)

* Include mmflow in readme (open-mmlab#6545)

* Include mmflow in readme

* Include mmflow in README_zh-CN

* Add mmflow url into the document menu in docs/conf.py and docs_zh-CN/conf.py.

* Make OHEM work with seesaw loss (open-mmlab#6514)

* [Enhance] Support file_client in Datasets and evaluating panoptic results on Ceph (open-mmlab#6489)

* first version

* Replace with our api

* Add copyright

* Move the runtime error to multi_core interface

* Add docstring

* Fix comments

* Add comments

* Add unit test for pq_compute_single_core

* Fix MMDetection model to ONNX command (open-mmlab#6558)

* Update README.md (open-mmlab#6567)

* [Feature] Support custom persistent_workers (open-mmlab#6435)

* Fix aug test error when the number of prediction bboxes is 0 (open-mmlab#6398)

* Fix aug test error when the number of prediction bboxes is 0

* test

* test

* fix lint

* Support custom pin_memory and persistent_workers

* fix comment

* fix docstr

* remove pin_memory

* Fix SSD512 config error (open-mmlab#6574)

* Fix mosaic repr typo (open-mmlab#6523)

* Include mmflow in readme (open-mmlab#6545)

* Include mmflow in readme

* Include mmflow in README_zh-CN

* Add mmflow url into the document menu in docs/conf.py and docs_zh-CN/conf.py.

* Make OHEM work with seesaw loss (open-mmlab#6514)

* Fix ssd512 config error

Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Czm369 <40661020+Czm369@users.noreply.github.com>
Co-authored-by: ohwi <supebulous@gmail.com>

* Catch symlink failure on Windows (open-mmlab#6482)

* Catch symlink failure on Windows

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* Set copy mode on Windows

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* Fix lint

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* Fix logic error

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* [Feature] Support Label Assignment Distillation (LAD) (open-mmlab#6342)

* add LAD

* inherit LAD from KnowledgeDistillationSingleStageDetector

* add configs/lad/lad_r101_paa_r50_fpn_coco_1x.py

* update LAD readme

* update configs/lad/README.md

* try not to use abbreviations for variable names

* add unittest for lad_head

* update test_lad_head

* remove main in tests/test_models/test_dense_heads/test_lad_head.py

* [Fix] Avoid infinite GPU waiting in dist training (open-mmlab#6501)

* [open-mmlab#6495] fix infinite GPU waiting in dist training

* print log_vars keys in assertion msg

* linting issue

* Support to collect the best models (open-mmlab#6560)

* Fix mosaic repr typo (open-mmlab#6523)

* Include mmflow in readme (open-mmlab#6545)

* Include mmflow in readme

* Include mmflow in README_zh-CN

* Add mmflow url into the document menu in docs/conf.py and docs_zh-CN/conf.py.

* Make OHEM work with seesaw loss (open-mmlab#6514)

* update

* support gather best model

Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Czm369 <40661020+Czm369@users.noreply.github.com>
Co-authored-by: ohwi <supebulous@gmail.com>

* [Enhance]: Optimize augmentation pipeline to speed up training. (open-mmlab#6442)

* Refactor YOLOX (open-mmlab#6443)

* Fix aug test error when the number of prediction bboxes is 0 (open-mmlab#6398)

* Fix aug test error when the number of prediction bboxes is 0

* test

* test

* fix lint

* Support custom pin_memory and persistent_workers

* [Docs] Chinese version of robustness_benchmarking.md (open-mmlab#6375)

* Chinese version of robustness_benchmarking.md

* Update docs_zh-CN/robustness_benchmarking.md

Co-authored-by: RangiLyu <lyuchqi@gmail.com>

* Update docs_zh-CN/robustness_benchmarking.md

Co-authored-by: RangiLyu <lyuchqi@gmail.com>

* Update docs_zh-CN/robustness_benchmarking.md

Co-authored-by: RangiLyu <lyuchqi@gmail.com>

* Update docs_zh-CN/robustness_benchmarking.md

Co-authored-by: RangiLyu <lyuchqi@gmail.com>

* Update docs_zh-CN/robustness_benchmarking.md

Co-authored-by: RangiLyu <lyuchqi@gmail.com>

* Update docs_zh-CN/robustness_benchmarking.md

Co-authored-by: RangiLyu <lyuchqi@gmail.com>

* Update robustness_benchmarking.md

* Update robustness_benchmarking.md

* Update robustness_benchmarking.md

* Update robustness_benchmarking.md

* Update robustness_benchmarking.md

* Update robustness_benchmarking.md

Co-authored-by: RangiLyu <lyuchqi@gmail.com>

* update yolox_s

* update yolox_s

* support dynamic eval interval

* fix some error

* support ceph

* fix none error

* fix batch error

* replace resize

* fix comment

* fix docstr

* Update the link of checkpoints (open-mmlab#6460)

* [Feature]: Support plot confusion matrix. (open-mmlab#6344)

* remove pin_memory

* update

* fix unittest

* update cfg

* fix error

* add unittest

* [Fix] Fix SpatialReductionAttention in PVT. (open-mmlab#6488)

* [Fix] Fix SpatialReductionAttention in PVT

* Add warning

* Save coco summarize print information to logger (open-mmlab#6505)

* Fix type error in 2_new_data_mode (open-mmlab#6469)

* Always map location to cpu when load checkpoint (open-mmlab#6405)

* configs: update groie README (open-mmlab#6401)

Signed-off-by: Leonardo Rossi <leonardo.rossi@unipr.it>

* [Fix] fix config path in docs (open-mmlab#6396)

* [Enchance] Set a random seed when the user does not set a seed. (open-mmlab#6457)

* fix random seed bug

* add comment

* enchance random seed

* rename

Co-authored-by: Haobo Yuan <yuanhaobo@whu.edu.cn>

* [BugFixed] fix wrong trunc_normal_init use (open-mmlab#6432)

* fix wrong trunc_normal_init use

* fix wrong trunc_normal_init use

* fix open-mmlab#6446

Co-authored-by: Uno Wu <st9007a@gmail.com>
Co-authored-by: Leonardo Rossi <leonardo.rossi@unipr.it>
Co-authored-by: BigDong <yudongwang@tju.edu.cn>
Co-authored-by: Haian Huang(深度眸) <1286304229@qq.com>
Co-authored-by: Haobo Yuan <yuanhaobo@whu.edu.cn>
Co-authored-by: Shusheng Yang <shusheng.yang@qq.com>

* bump version to v2.18.1 (open-mmlab#6510)

* bump version to v2.18.1

* Update changelog.md

* add some comment

* fix some comment

* update readme

* fix lint

* add reduce mean

* update

* update readme

* update params

Co-authored-by: Cedric Luo <luochunhua1996@outlook.com>
Co-authored-by: RangiLyu <lyuchqi@gmail.com>
Co-authored-by: Guangchen Lin <347630870@qq.com>
Co-authored-by: Andrea Panizza <8233615+AndreaPi@users.noreply.github.com>
Co-authored-by: Uno Wu <st9007a@gmail.com>
Co-authored-by: Leonardo Rossi <leonardo.rossi@unipr.it>
Co-authored-by: BigDong <yudongwang@tju.edu.cn>
Co-authored-by: Haobo Yuan <yuanhaobo@whu.edu.cn>
Co-authored-by: Shusheng Yang <shusheng.yang@qq.com>

* [Refactor] Remove some code in `mmdet/apis/train.py` (open-mmlab#6576)

* remove some code about custom hooks in apis/train.py

* files were modified by yapf

* Fix lad repeatedly output warning message (open-mmlab#6584)

* drop support for pytorch 1.3.1. (open-mmlab#6589)

* drop support for pytorch 1.3.1.

* update ci

* update ci

* Skip CI on ignoring paths (open-mmlab#6578)

* [test] Skip CI on ignoring paths

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* [skip ci] fix build.yaml

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* split lint in workflows

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* split lint in workflows

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* split lint in workflows

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* [Refactor]: refactor configs of FP16 models (open-mmlab#6592)

* refactor configs of fp16

* update

* update

* update

* Ignoring paths (open-mmlab#6615)

Signed-off-by: del-zhenwu <dele.zhenwu@gmail.com>

* fix FP16 CI (open-mmlab#6616)

* Bump to 2.19.0 (open-mmlab#6633)

* bump to 2.19.0

* resolve comments

* fix typo

* fix

* fix yolox cuda unittest (open-mmlab#6636)

* Support circleci (open-mmlab#6630)

* add ci

* rename

* fix mmtrack unittest

* update mmcv version

* add abstract and sketch to the CONFIGS/algorithm/README.md

* add some ## Introduction

* revert

Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: ohwi <supebulous@gmail.com>
Co-authored-by: Guangchen Lin <347630870@qq.com>
Co-authored-by: Rishit Dagli <rishit.dagli@gmail.com>
Co-authored-by: RangiLyu <lyuchqi@gmail.com>
Co-authored-by: Haian Huang(深度眸) <1286304229@qq.com>
Co-authored-by: del-zhenwu <dele.zhenwu@gmail.com>
Co-authored-by: Thuy Ng <thuypn9a4@gmail.com>
Co-authored-by: Han Zhang <623606860@qq.com>
Co-authored-by: Cedric Luo <luochunhua1996@outlook.com>
Co-authored-by: Andrea Panizza <8233615+AndreaPi@users.noreply.github.com>
Co-authored-by: Uno Wu <st9007a@gmail.com>
Co-authored-by: Leonardo Rossi <leonardo.rossi@unipr.it>
Co-authored-by: BigDong <yudongwang@tju.edu.cn>
Co-authored-by: Haobo Yuan <yuanhaobo@whu.edu.cn>
Co-authored-by: Shusheng Yang <shusheng.yang@qq.com>
Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>
Co-authored-by: Shilong Zhang <61961338+jshilong@users.noreply.github.com>
  • Loading branch information
19 people committed Jul 18, 2022
1 parent 9611e43 commit b13b5f3
Show file tree
Hide file tree
Showing 74 changed files with 1,180 additions and 105 deletions.
16 changes: 16 additions & 0 deletions configs/albu_example/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
# Albu Example

## Abstract

<!-- [ABSTRACT] -->

Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve output labels. In computer vision domain, image augmentations have become a common implicit regularization technique to combat overfitting in deep convolutional neural networks and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations and combinations of flipping, rotating, scaling, and cropping. Moreover, the image processing speed varies in existing tools for image augmentation. We present Albumentations, a fast and flexible library for image augmentations with many various image transform operations available, that is also an easy-to-use wrapper around other augmentation libraries. We provide examples of image augmentations for different computer vision tasks and show that Albumentations is faster than other commonly used image augmentation tools on the most of commonly used image transformations.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143870703-74f3ea3f-ae23-4035-9856-746bc3f88464.png" height="400" />
</div>

<!-- [PAPER_TITLE: Albumentations: fast and flexible image augmentations] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1809.06839] -->

## Citation

<!-- [OTHERS] -->

```
Expand Down
16 changes: 15 additions & 1 deletion configs/atss/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Object detection has been dominated by anchor-based detectors for several years. Recently, anchor-free detectors have become popular due to the proposal of FPN and Focal Loss. In this paper, we first point out that the essential difference between anchor-based and anchor-free detection is actually how to define positive and negative training samples, which leads to the performance gap between them. If they adopt the same definition of positive and negative samples during training, there is no obvious difference in the final performance, no matter regressing from a box or a point. This shows that how to select positive and negative training samples is important for current object detectors. Then, we propose an Adaptive Training Sample Selection (ATSS) to automatically select positive and negative samples according to statistical characteristics of object. It significantly improves the performance of anchor-based and anchor-free detectors and bridges the gap between them. Finally, we discuss the necessity of tiling multiple anchors per location on the image to detect objects. Extensive experiments conducted on MS COCO support our aforementioned analysis and conclusions. With the newly introduced ATSS, we improve state-of-the-art detectors by a large margin to 50.7% AP without introducing any overhead.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143870776-c81168f5-e8b2-44ee-978b-509e4372c5c9.png"/>
</div>

<!-- [PAPER_TITLE: Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1912.02424] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
16 changes: 15 additions & 1 deletion configs/autoassign/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# AutoAssign: Differentiable Label Assignment for Dense Object Detection

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Determining positive/negative samples for object detection is known as label assignment. Here we present an anchor-free detector named AutoAssign. It requires little human knowledge and achieves appearance-aware through a fully differentiable weighting mechanism. During training, to both satisfy the prior distribution of data and adapt to category characteristics, we present Center Weighting to adjust the category-specific prior distributions. To adapt to object appearances, Confidence Weighting is proposed to adjust the specific assign strategy of each instance. The two weighting modules are then combined to generate positive and negative weights to adjust each location's confidence. Extensive experiments on the MS COCO show that our method steadily surpasses other best sampling strategies by large margins with various backbones. Moreover, our best model achieves 52.1% AP, outperforming all existing one-stage detectors. Besides, experiments on other datasets, e.g., PASCAL VOC, Objects365, and WiderFace, demonstrate the broad applicability of AutoAssign.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143870875-33567e44-0584-4470-9a90-0df0fb6c1fe2.png"/>
</div>

<!-- [PAPER_TITLE: AutoAssign: Differentiable Label Assignment for Dense Object Detection] -->
<!-- [PAPER_URL: https://arxiv.org/abs/2007.03496] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
16 changes: 15 additions & 1 deletion configs/carafe/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# CARAFE: Content-Aware ReAssembly of FEatures

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Feature upsampling is a key operation in a number of modern convolutional network architectures, e.g. feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose Content-Aware ReAssembly of FEatures (CARAFE), a universal, lightweight and highly effective operator to fulfill this goal. CARAFE has several appealing properties: (1) Large field of view. Unlike previous works (e.g. bilinear interpolation) that only exploit sub-pixel neighborhood, CARAFE can aggregate contextual information within a large receptive field. (2) Content-aware handling. Instead of using a fixed kernel for all samples (e.g. deconvolution), CARAFE enables instance-specific content-aware handling, which generates adaptive kernels on-the-fly. (3) Lightweight and fast to compute. CARAFE introduces little computational overhead and can be readily integrated into modern network architectures. We conduct comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation and inpainting. CARAFE shows consistent and substantial gains across all the tasks (1.2%, 1.3%, 1.8%, 1.1db respectively) with negligible computational overhead. It has great potential to serve as a strong building block for future research. It has great potential to serve as a strong building block for future research.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143872016-48225685-0e59-49cf-bd65-a50ee04ca8a2.png"/>
</div>

<!-- [PAPER_TITLE: CARAFE: Content-Aware ReAssembly of FEatures] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1905.02188] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
16 changes: 15 additions & 1 deletion configs/cascade_rcnn/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# Cascade R-CNN: High Quality Object Detection and Instance Segmentation

## Introduction
## Abstract

<!-- [ABSTRACT] -->

In object detection, the intersection over union (IoU) threshold is frequently used to define positives/negatives. The threshold used to train a detector defines its quality. While the commonly used threshold of 0.5 leads to noisy (low-quality) detections, detection performance frequently degrades for larger thresholds. This paradox of high-quality detection has two causes: 1) overfitting, due to vanishing positive samples for large thresholds, and 2) inference-time quality mismatch between detector and test hypotheses. A multi-stage object detection architecture, the Cascade R-CNN, composed of a sequence of detectors trained with increasing IoU thresholds, is proposed to address these problems. The detectors are trained sequentially, using the output of a detector as training set for the next. This resampling progressively improves hypotheses quality, guaranteeing a positive training set of equivalent size for all detectors and minimizing overfitting. The same cascade is applied at inference, to eliminate quality mismatches between hypotheses and detectors. An implementation of the Cascade R-CNN without bells or whistles achieves state-of-the-art performance on the COCO dataset, and significantly improves high-quality detection on generic and specific object detection datasets, including VOC, KITTI, CityPerson, and WiderFace. Finally, the Cascade R-CNN is generalized to instance segmentation, with nontrivial improvements over the Mask R-CNN.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143872197-d99b90e4-4f05-4329-80a4-327ac862a051.png"/>
</div>

<!-- [PAPER_TITLE: Cascade R-CNN: High Quality Object Detection and Instance Segmentation] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1906.09756] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
18 changes: 17 additions & 1 deletion configs/cascade_rpn/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,20 @@
# Cascade RPN
# Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution

## Abstract

<!-- [ABSTRACT] -->

This paper considers an architecture referred to as Cascade Region Proposal Network (Cascade RPN) for improving the region-proposal quality and detection performance by systematically addressing the limitation of the conventional RPN that heuristically defines the anchors and aligns the features to the anchors. First, instead of using multiple anchors with predefined scales and aspect ratios, Cascade RPN relies on a single anchor per location and performs multi-stage refinement. Each stage is progressively more stringent in defining positive samples by starting out with an anchor-free metric followed by anchor-based metrics in the ensuing stages. Second, to attain alignment between the features and the anchors throughout the stages, adaptive convolution is proposed that takes the anchors in addition to the image features as its input and learns the sampled features guided by the anchors. A simple implementation of a two-stage Cascade RPN achieves AR 13.4 points higher than that of the conventional RPN, surpassing any existing region proposal methods. When adopting to Fast R-CNN and Faster R-CNN, Cascade RPN can improve the detection mAP by 3.1 and 3.5 points, respectively.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143872368-1580193a-d19c-4723-a579-c7ed2d5da4d1.png"/>
</div>

<!-- [PAPER_TITLE: Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1909.06720] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
18 changes: 16 additions & 2 deletions configs/centernet/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# CenterNet
# Objects as Points

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143873810-85ffa6e7-915b-46a4-9b8f-709e5d7700bb.png"/>
</div>

<!-- [PAPER_TITLE: Objects as Points] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1904.07850] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
18 changes: 16 additions & 2 deletions configs/centripetalnet/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# CentripetalNet
# CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Keypoint-based detectors have achieved pretty-well performance. However, incorrect keypoint matching is still widespread and greatly affects the performance of the detector. In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance. CentripetalNet predicts the position and the centripetal shift of the corner points and matches corners whose shifted results are aligned. Combining position information, our approach matches corner points more accurately than the conventional embedding approaches do. Corner pooling extracts information inside the bounding boxes onto the border. To make this information more aware at the corners, we design a cross-star deformable convolution network to conduct feature adaption. Furthermore, we explore instance segmentation on anchor-free detectors by equipping our CentripetalNet with a mask prediction module. On MS-COCO test-dev, our CentripetalNet not only outperforms all existing anchor-free detectors with an AP of 48.0% but also achieves comparable performance to the state-of-the-art instance segmentation approaches with a 40.2% MaskAP.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143873955-42804e0e-3638-4c5b-8bf4-ac8133bbcdc8.png"/>
</div>

<!-- [PAPER_TITLE: CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection] -->
<!-- [PAPER_URL: https://arxiv.org/abs/2003.09119] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
19 changes: 18 additions & 1 deletion configs/cityscapes/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,21 @@
# Cityscapes Dataset
# The Cityscapes Dataset for Semantic Urban Scene Understanding

## Abstract

<!-- [ABSTRACT] -->

Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143874154-db4484a5-9211-41f6-852a-b7f0a8c9ec26.png"/>
</div>

<!-- [PAPER_TITLE: The Cityscapes Dataset for Semantic Urban Scene Understanding] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1604.01685] -->

## Citation

<!-- [DATASET] -->

Expand Down
18 changes: 16 additions & 2 deletions configs/cornernet/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# CornerNet
# Cornernet: Detecting objects as paired keypoints

## Introduction
## Abstract

<!-- [ABSTRACT] -->

We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143876061-4de20768-c812-4b97-b089-944d8db91ca2.png"/>
</div>

<!-- [PAPER_TITLE: Cornernet: Detecting objects as paired keypoints] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1808.01244] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
16 changes: 15 additions & 1 deletion configs/dcn/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# Deformable Convolutional Networks

## Introduction
## Abstract

<!-- [ABSTRACT] -->

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules. In this work, we introduce two new modules to enhance the transformation modeling capacity of CNNs, namely, deformable convolution and deformable RoI pooling. Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from target tasks, without additional supervision. The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the effectiveness of our approach on sophisticated vision tasks of object detection and semantic segmentation.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143876246-c4985e25-e286-4511-9b7c-97af2857461e.png"/>
</div>

<!-- [PAPER_TITLE: Deformable Convolutional Networks] -->
<!-- [PAPER_URL: https://arxiv.org/abs/1703.06211] -->

## Citation

<!-- [ALGORITHM] -->

Expand Down
20 changes: 19 additions & 1 deletion configs/deepfashion/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,20 @@
# DeepFashion
# DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations

## Abstract

<!-- [ABSTRACT] -->

Recent advances in clothes recognition have been driven by the construction of clothes datasets. Existing datasets are limited in the amount of annotations and are difficult to cope with the various challenges in real-world applications. In this work, we introduce DeepFashion, a large-scale clothes dataset with comprehensive annotations. It contains over 800,000 images, which are richly annotated with massive attributes, clothing landmarks, and correspondence of images taken under different scenarios including store, street snapshot, and consumer. Such rich annotations enable the development of powerful algorithms in clothes recognition and facilitating future researches. To demonstrate the advantages of DeepFashion, we propose a new deep model, namely FashionNet, which learns clothing features by jointly predicting clothing attributes and landmarks. The estimated landmarks are then employed to pool or gate the learned features. It is optimized in an iterative manner. Extensive experiments demonstrate the effectiveness of FashionNet and the usefulness of DeepFashion.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/40661020/143876310-08470a6a-ea3a-4ec1-a6f2-8ec5df36a8a0.png"/>
</div>

<!-- [PAPER_TITLE: DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations] -->
<!-- [PAPER_URL: https://openaccess.thecvf.com/content_cvpr_2016/html/Liu_DeepFashion_Powering_Robust_CVPR_2016_paper.html] -->

## Introduction

<!-- [DATASET] -->

Expand Down Expand Up @@ -39,6 +55,8 @@ mmdetection
After that you can train the Mask RCNN r50 on DeepFashion-In-shop dataset by launching training with the `mask_rcnn_r50_fpn_1x.py` config
or creating your own config file.

## Citation

```
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou},
Expand Down
Loading

0 comments on commit b13b5f3

Please sign in to comment.