[TOC]
- Paper : The Pascal Visual Object Classes Challenge: A Retrospective
- Main page : http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
It contains 20 foreground object classes and one background class.
1464 (train), 1449 (val), and 1456 (test) pixel-level annotated images.
- Paper : Semantic contours from inverse detectors
- Dataset main page: Semantic Boundaries Dataset and Benchmark
The 11318 images in the SBD are divided into 8498 training images and 2820 test images.
- Download Link: https://www.dropbox.com/s/oeu149j8qtbs1x0/SegmentationClassAug.zip?dl=0
- This download link is provided by: https://github.com/DrSleep/tensorflow-deeplab-resnet
- How to use 10,582 train-aug images on DeeplabV3 code?
- Paper(CVPR, 2018): COCO-Stuff: Thing and Stuff Classes in Context
- Main page: https://github.com/nightrome/cocostuff
COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations.
- COCO-Stuff dataset: The final version of COCO-Stuff, that is presented on this page. It includes all 164K images from COCO 2017 (train 118K, val 5K, test-dev 20K, test-challenge 20K). It covers 172 classes: 80 thing classes, 91 stuff classes and 1 class 'unlabeled'. This dataset will form the basis of all upcoming challenges.
- COCO 2017 Stuff Segmentation Challenge: A semantic segmentation challenge on 55K images (train 40K, val 5K, test-dev 5K, test-challenge 5K) of COCO. To focus on stuff, we merged all 80 thing classes into a single class 'other'. The results of the challenge were presented at the Joint COCO and Places Recognition Workshop at ICCV 2017.
- COCO-Stuff 10K dataset: Our first dataset, annotated by 10 in-house annotators at the University of Edinburgh. It includes 10K images from the training set of COCO. We provide a 9K/1K (train/val) split to make results comparable. The dataset includes 80 thing classes, 91 stuff classes and 1 class 'unlabeled'. This was initially presented as 91 thing classes, but is now changed to 80 thing classes, as 11 classes do not have any segmentation annotations in COCO. This dataset is a subset of all other releases.
- paper: "The Role of Context for Object Detection and Semantic Segmentation in the Wild"
- Main page: https://cs.stanford.edu/~roozbeh/pascal-context/
PASCAL-Context dataset augments PASCAL VOC 2010 dataset with annotations for 400+ additional categories. The dataset contains semantic segmentation annotations for 10,103 images in the Training and Validation subsets of PASCAL VOC 2010 dataset.
The previous annotations covered around 29% of pixels in the dataset, while ours covers 100% of pixels. The dataset contains annotations for things (e.g., keyboard, fork), stuff (e.g., sky, water) and hybrids (e.g., road) that have clear boundaries, but their shape is more complex than the shape of things.
- Paper: The cityscapes dataset for semantic urban scene understanding
- Main page: https://www.cityscapes-dataset.com
Details:
- 30 classes
- 5000 annotated images with fine annotations
- 20000 annotated images with coarse annotations
Descriptions:
CITYSCAPES is a real-world vehicle-egocentric image dataset collected from 50 cities in Germany and the countries around.
There are 5000 annotated images with fine annotations.
It provides three disjoint subsets: 2975 training images, 500 validation images, and 1525 test images. It also provides accurate pixel-level annotations for all images with 19 different categories.
The following descriptions are from Attention-based Multi-Context Guiding for Few-Shot Semantic Segmentation
This dataset is originated from PASCAL VOC12 (Everingham et al. ) and extended annotations from SDS (Hariharan et al. ). The set of 20 classes in PASCAL VOC12 is divided into four sub-datasets as indicated in Table 2. Three sub-datasets are used as the training label-set
The training set
- Paper: The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes
- Main Page: http://synthia-dataset.net/download-2/
SYNTHIA is a large dataset of photo-realistic frames rendered from a virtual city with precise pixel-level annotations.
It is the set containing the original 13,407 images used to perform training and domain adaptation of the models presented in our CVPR’16 paper. These images are generated as random perturbation of the world and therefore do not have temporal consistency (this is not a video stream). These images have annotations for 11 basic classes and do not have annotations for instances. The classes are: void, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist.
- Paper: Playing for data: Ground truth from computer games.
- Main Page: https://download.visinf.tu-darmstadt.de/data/from_games/index.html
- Code: https://bitbucket.org/visinf/projects-2016-playing-for-data
GTA-5 contains 24966 high quality labeled frames from realistic open-world computer games, Grand Theft Auto V (GTA-5). Each frame is generated from fictional city of Los Santos, based on Los Angeles in Southern California with annotations that are compatible with CITYSCAPES dataset.
- Paper: Scene parsing through ade20k dataset.
- Main Page: http://groups.csail.mit.edu/vision/datasets/ADE20K/
- Code: https://github.com/CSAILVision/semantic-segmentation-pytorch
ADE20K is a recent scene parsing benchmark containing dense labels of 150 stuff/object categories. The dataset includes 20K/2K/3K images for training, validation and test.
- Main Page: http://cocodataset.org/#panoptic-2018
- Paper: Panoptic segmentation
This dataset contains 118K images for training, 5k images for validation, with annotations on 80 categories for the thing and 53 classes for stuff.
Descriptions are gathered from many different papers, and thank you all !