[TOC]

Summary of the semantic segmentation datasets

1. PASCAL VOC 2012

Paper : The Pascal Visual Object Classes Challenge: A Retrospective
Main page : http://host.robots.ox.ac.uk/pascal/VOC/voc2012/

It contains 20 foreground object classes and one background class.

1464 (train), 1449 (val), and 1456 (test) pixel-level annotated images.

2. Semantic Boundaries Dataset (SBD)

Paper : Semantic contours from inverse detectors
Dataset main page: Semantic Boundaries Dataset and Benchmark

The 11318 images in the SBD are divided into 8498 training images and 2820 test images.

3. PASCAL VOC 2012 augmented with SBD

Download Link: https://www.dropbox.com/s/oeu149j8qtbs1x0/SegmentationClassAug.zip?dl=0
This download link is provided by: https://github.com/DrSleep/tensorflow-deeplab-resnet
How to use 10,582 train-aug images on DeeplabV3 code?

4. The COCO-Stuff dataset

Paper(CVPR, 2018): COCO-Stuff: Thing and Stuff Classes in Context
Main page: https://github.com/nightrome/cocostuff

COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations.

COCO-Stuff dataset: The final version of COCO-Stuff, that is presented on this page. It includes all 164K images from COCO 2017 (train 118K, val 5K, test-dev 20K, test-challenge 20K). It covers 172 classes: 80 thing classes, 91 stuff classes and 1 class 'unlabeled'. This dataset will form the basis of all upcoming challenges.
COCO 2017 Stuff Segmentation Challenge: A semantic segmentation challenge on 55K images (train 40K, val 5K, test-dev 5K, test-challenge 5K) of COCO. To focus on stuff, we merged all 80 thing classes into a single class 'other'. The results of the challenge were presented at the Joint COCO and Places Recognition Workshop at ICCV 2017.
COCO-Stuff 10K dataset: Our first dataset, annotated by 10 in-house annotators at the University of Edinburgh. It includes 10K images from the training set of COCO. We provide a 9K/1K (train/val) split to make results comparable. The dataset includes 80 thing classes, 91 stuff classes and 1 class 'unlabeled'. This was initially presented as 91 thing classes, but is now changed to 80 thing classes, as 11 classes do not have any segmentation annotations in COCO. This dataset is a subset of all other releases.

5. PASCAL-Context Dataset

paper: "The Role of Context for Object Detection and Semantic Segmentation in the Wild"
Main page: https://cs.stanford.edu/~roozbeh/pascal-context/

PASCAL-Context dataset augments PASCAL VOC 2010 dataset with annotations for 400+ additional categories. The dataset contains semantic segmentation annotations for 10,103 images in the Training and Validation subsets of PASCAL VOC 2010 dataset.

The previous annotations covered around 29% of pixels in the dataset, while ours covers 100% of pixels. The dataset contains annotations for things (e.g., keyboard, fork), stuff (e.g., sky, water) and hybrids (e.g., road) that have clear boundaries, but their shape is more complex than the shape of things.

6. Cityscapes Datasets

Paper: The cityscapes dataset for semantic urban scene understanding
Main page: https://www.cityscapes-dataset.com

Details:

30 classes
5000 annotated images with fine annotations
20000 annotated images with coarse annotations

Descriptions:

CITYSCAPES is a real-world vehicle-egocentric image dataset collected from 50 cities in Germany and the countries around.

There are 5000 annotated images with fine annotations.

It provides three disjoint subsets: 2975 training images, 500 validation images, and 1525 test images. It also provides accurate pixel-level annotations for all images with 19 different categories.

7. PASCAL-$5^i$ (for shot learning)

Paper: One-shot learning for semantic segmentation
Main Page: https://github.com/lzzcd001/OSLSM

The following descriptions are from Attention-based Multi-Context Guiding for Few-Shot Semantic Segmentation

This dataset is originated from PASCAL VOC12 (Everingham et al. ) and extended annotations from SDS (Hariharan et al. ). The set of 20 classes in PASCAL VOC12 is divided into four sub-datasets as indicated in Table 2. Three sub-datasets are used as the training label-set $L_{train}$, the left one sub-dataset is utilized for test label-set $L_{test}$.

The training set $D_{train}$ is composed of all image-mask pairs from PASCAL VOC12 and SDS training sets that include at least one pixel in the segmentation mask from the label-set $L_{train}$. The masks in $D_{train}$ are modified into binary masks by setting pixels whose semantic class are not in $L_{train}$ as background class l�. The test set $D_{test}$ is from PASCAL VOC12 and SDS validation sets, and the processing procedure for test set $D_{test}$ is similar with training set $D_{train}$.

8. SYNTHIA

Paper: The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes
Main Page: http://synthia-dataset.net/download-2/

SYNTHIA is a large dataset of photo-realistic frames rendered from a virtual city with precise pixel-level annotations.

It is the set containing the original 13,407 images used to perform training and domain adaptation of the models presented in our CVPR’16 paper. These images are generated as random perturbation of the world and therefore do not have temporal consistency (this is not a video stream). These images have annotations for 11 basic classes and do not have annotations for instances. The classes are: void, sky, building, road, sidewalk, fence, vegetation, pole, car, sign, pedestrian, cyclist.

9. GTA-5

Paper: Playing for data: Ground truth from computer games.
Main Page: https://download.visinf.tu-darmstadt.de/data/from_games/index.html
Code: https://bitbucket.org/visinf/projects-2016-playing-for-data

GTA-5 contains 24966 high quality labeled frames from realistic open-world computer games, Grand Theft Auto V (GTA-5). Each frame is generated from fictional city of Los Santos, based on Los Angeles in Southern California with annotations that are compatible with CITYSCAPES dataset.

10. ADE20K

Paper: Scene parsing through ade20k dataset.
Main Page: http://groups.csail.mit.edu/vision/datasets/ADE20K/
Code: https://github.com/CSAILVision/semantic-segmentation-pytorch

ADE20K is a recent scene parsing benchmark containing dense labels of 150 stuff/object categories. The dataset includes 20K/2K/3K images for training, validation and test.

11. COCO panoptic segmentation dataset

Main Page: http://cocodataset.org/#panoptic-2018
Paper: Panoptic segmentation

This dataset contains 118K images for training, 5k images for validation, with annotations on 80 categories for the thing and 53 classes for stuff.

ATTENTION

Descriptions are gathered from many different papers, and thank you all !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary of the semantic segmentation datasets.md

Summary of the semantic segmentation datasets.md

Summary of the semantic segmentation datasets

1. PASCAL VOC 2012

2. Semantic Boundaries Dataset (SBD)

3. PASCAL VOC 2012 augmented with SBD

4. The COCO-Stuff dataset

5. PASCAL-Context Dataset

6. Cityscapes Datasets

7. PASCAL-$5^i$ (for shot learning)

8. SYNTHIA

9. GTA-5

10. ADE20K

11. COCO panoptic segmentation dataset

ATTENTION

Files

Summary of the semantic segmentation datasets.md

Latest commit

History

Summary of the semantic segmentation datasets.md

File metadata and controls

Summary of the semantic segmentation datasets

1. PASCAL VOC 2012

2. Semantic Boundaries Dataset (SBD)

3. PASCAL VOC 2012 augmented with SBD

4. The COCO-Stuff dataset

5. PASCAL-Context Dataset

6. Cityscapes Datasets

7. PASCAL-$5^i$ (for shot learning)

8. SYNTHIA

9. GTA-5

10. ADE20K

11. COCO panoptic segmentation dataset

ATTENTION