-
Notifications
You must be signed in to change notification settings - Fork 304
Add ADE20K dataset #429
Add ADE20K dataset #429
Changes from 3 commits
30c3893
30e8496
f4d8f71
1ecb49f
07ccec5
8c676b9
91585af
7f913ee
70826ad
d0a4d8d
9f70102
7797bd5
f9b719c
5b5af14
1394f6f
5c69e06
d6bc134
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
import glob | ||
from multiprocessing import Pool | ||
import os | ||
|
||
import numpy as np | ||
|
||
from chainer import dataset | ||
from chainer.dataset import download | ||
from chainercv import utils | ||
from chainercv.utils import read_image | ||
|
||
root = 'pfnet/chainercv/ade20k' | ||
trainval_url = 'http://data.csail.mit.edu/places/ADEchallenge/' | ||
trainval_url += 'ADEChallengeData2016.zip' | ||
test_url = 'http://data.csail.mit.edu/places/ADEchallenge/release_test.zip' | ||
|
||
|
||
def get_ade20k(): | ||
p = Pool(2) | ||
data_root = download.get_dataset_directory(root) | ||
urls = [trainval_url, test_url] | ||
ret = [p.apply_async(utils.cached_download, args=(url,)) for url in urls] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. So, actually, it tries to show two progress reports at the same line, then the display of the progress report is broken. Should I do those download sequentially? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer sequential downloading. Is the downloading too slow? |
||
caches = [r.get() for r in ret] | ||
args = [(cache_fn, data_root, os.path.splitext(url)[1]) | ||
for cache_fn, url in zip(caches, urls)] | ||
ret = [p.apply_async(utils.extractall, args=arg) for arg in args] | ||
for r in ret: | ||
r.get() | ||
return data_root | ||
|
||
|
||
class ADE20KSemanticSegmentationDataset(dataset.DatasetMixin): | ||
|
||
"""Semantic segmentation dataset for `ADE20K`_. | ||
|
||
This is ADE20K dataset distributed in MIT Scene Parsing Benchmark website. | ||
It has 20,210 training images, 2,000 validation images, and 3,352 test | ||
images. | ||
|
||
.. _`MIT Scene Parsing Benchmark`: http://sceneparsing.csail.mit.edu/ | ||
|
||
Args: | ||
data_dir (string): Path to the dataset directory. The directory should | ||
contain at least two directories, :obj:`annotations` and | ||
:obj:`images`. If :obj:`auto` is given, the dataset is | ||
automatically downloaded into | ||
:obj:`$CHAINER_DATASET_ROOT/pfnet/chainercv/ade20k`. | ||
split ({'train', 'val', 'test'}): Select from dataset splits used in | ||
Cityscapes dataset. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops, I forgot to fix the name... Thanks for pointing out it! |
||
|
||
""" | ||
|
||
def __init__(self, data_dir='auto', split='train'): | ||
if data_dir is 'auto': | ||
data_dir = get_ade20k() | ||
|
||
if split == 'train' or split == 'val': | ||
img_dir = os.path.join( | ||
data_dir, 'ADEChallengeData2016', 'images', | ||
'training' if split == 'train' else 'validation') | ||
label_dir = os.path.join( | ||
data_dir, 'ADEChallengeData2016', 'annotations', | ||
'training' if split == 'train' else 'validation') | ||
elif split == 'test': | ||
img_dir = os.path.join(data_dir, 'release_test', 'testing') | ||
else: | ||
raise ValueError( | ||
'Please give \'split\' argument with either \'train\', ' | ||
'\'val\', or \'test\'.') | ||
|
||
self.img_paths = sorted(glob.glob(os.path.join(img_dir, '*.jpg'))) | ||
if split == 'train' or split == 'val': | ||
self.label_paths = sorted( | ||
glob.glob(os.path.join(label_dir, '*.png'))) | ||
|
||
self.split = split | ||
|
||
def __len__(self): | ||
return len(self.img_paths) | ||
|
||
def get_example(self, i): | ||
"""Returns the i-th example. | ||
|
||
Args: | ||
i (int): The index of the example. | ||
|
||
Returns: | ||
When :obj:`split` is either :obj:`train` or :obj:`val`, it returns | ||
a tuple consited of a color image and a label whose shapes are | ||
(3, H, W) and (H, W), respectively, while :obj:`split` is | ||
:obj:`test`, it returns only the color image. H and W are height | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, what should I do? Are you suggesting that I should remove the test split from this class or I should treat this class as a dataset that is not a SemanticSegmentationDataset? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or making the test split as a separate one that is just an ImageDataset could be a possible choice. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Among those two solutions, I prefer later one, separating into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I'll try that option. Thanks for the advice! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sorry. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated |
||
and width of the image. The dtype of the color image is | ||
:obj:`numpy.float32` and the dtype of the label image is | ||
:obj:`numpy.int32`. | ||
|
||
""" | ||
img = read_image(self.img_paths[i]) | ||
if self.split == 'train' or self.split == 'val': | ||
label = read_image( | ||
self.label_paths[i], dtype=np.int32, color=False)[0] | ||
return img, label | ||
else: | ||
return img |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to our naming convention,
ade20k_label_colors
should beade20k_semantic_segmentation_label_colors
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/chainer/chainercv/blob/master/chainercv/datasets/cityscapes/cityscapes_utils.py#L53
You think this is a problem also?
label_names
should be changed as well because it can be different from the one used by Instance Segmentation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is also a problem. Thank you for pointing out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize:
ade_label_colors
-->ade_semantic_segmentation_label_colors
ade_label_names
-->ade_semantic_segmentation_label_names
cityscapes_label_colors
-->cityscapes_semantic_segmentation_label_colors
cityscapes_label_names
-->cityscapes_semantic_segmentation_label_names
Objects for CamVid need not be changed because this dataset only contains semantic segmentation data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for summarizing.
I agree with you.