Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MMDetection COCO format importer #1213

Merged
merged 7 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## \[Unreleased\]
### New features
- Support MMDetection COCO format
(<https://github.com/openvinotoolkit/datumaro/pull/1213>)

### Enhancements
- Optimize Python import to make CLI entrypoint faster
(<https://github.com/openvinotoolkit/datumaro/pull/1182>)
Expand Down
5 changes: 5 additions & 0 deletions docs/source/docs/data-formats/formats/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Supported Data Formats
mapillary_vistas
market1501
mars
mmdet
mnist
mot
mots
Expand Down Expand Up @@ -141,6 +142,10 @@ Supported Data Formats
* `Format specification <https://zheng-lab.cecs.anu.edu.au/Project/project_mars.html>`_
* `Dataset example <https://github.com/openvinotoolkit/datumaro/tree/develop/tests/assets/mars_dataset>`_
* `Format documentation <mars.md>`_
* MMDet-COCO (``detection``, ``segmentation``)
* `Format specification <https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html>`_
* `Dataset example <https://github.com/openvinotoolkit/datumaro/tree/develop/tests/assets/coco_dataset/mmdet_coco>`_
* `Format documentation <mmdet.md>`_
* MNIST (``classification``)
* `Format specification <http://yann.lecun.com/exdb/mnist/>`_
* `Dataset example <https://github.com/openvinotoolkit/datumaro/tree/develop/tests/assets/mnist_dataset>`_
Expand Down
41 changes: 41 additions & 0 deletions docs/source/docs/data-formats/formats/mmdet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# MMDetection COCO

## Format specification

[MMDetection](https://mmdetection.readthedocs.io/en/latest/) is a training framework for object detection and instance segmentation tasks, providing a modular and flexible architecture that supports various state-of-the-art models, datasets, and training techniques. MMDetection has gained popularity in the research community for its comprehensive features and ease of use in developing and benchmarking object detection algorithms.
MMDetection specifies their COCO format [here](https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html).

Most of available tasks or formats are similar to the [original COCO format](./formats/coco), while only the image directories are separated with respect to subsets.
In this document, we just describe the directory structure of MMDetection COCO format as per [here](https://mmdetection.readthedocs.io/en/latest/user_guides/dataset_prepare.html).
MMDetection COCO dataset directory should have the following structure:

<!--lint disable fenced-code-flag-->
```
└─ Dataset/
├── <subset_name>/
│ ├── <image_name1.ext>
│ ├── <image_name2.ext>
│ └── ...
├── <subset_name>/
│ ├── <image_name1.ext>
│ ├── <image_name2.ext>
│ └── ...
└── annotations/
├── instances_<subset_name>.json
└── ...
```

### Import using CLI

``` bash
datum project create
datum project import --format mmdet_coco <path/to/dataset>
```

### Import using Python API

```python
import datumaro as dm

dataset = dm.Dataset.import_from('<path/to/dataset>', 'mmdet_coco')
```
22 changes: 22 additions & 0 deletions src/datumaro/plugins/data_formats/coco/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,25 @@
return osp.join(rootpath, subset)


class MmdetDirPathExtracter(DirPathExtracter):
@staticmethod
def find_rootpath(path: str) -> str:
"""Find root path from annotation json file path."""
path = osp.abspath(path)
if osp.dirname(path).endswith(CocoPath.ANNOTATIONS_DIR):
return path.rsplit(CocoPath.ANNOTATIONS_DIR, maxsplit=1)[0]
raise DatasetImportError(

Check warning on line 98 in src/datumaro/plugins/data_formats/coco/base.py

View check run for this annotation

Codecov / codecov/patch

src/datumaro/plugins/data_formats/coco/base.py#L98

Added line #L98 was not covered by tests
f"Annotation path ({path}) should be under the directory which is named {CocoPath.ANNOTATIONS_DIR}. "
"If not, Datumaro fails to find the root path for this dataset. "
"Please follow this instruction, https://github.com/cocodataset/cocoapi/blob/master/README.txt"
)

@staticmethod
def find_images_dir(rootpath: str, subset: str) -> str:
"""Find images directory from the root path."""
return osp.join(rootpath, subset)


class _CocoBase(SubsetBase):
"""
Parses COCO annotations written in the following format:
Expand Down Expand Up @@ -121,6 +140,9 @@
elif coco_importer_type == CocoImporterType.roboflow:
self._rootpath = RoboflowDirPathExtracter.find_rootpath(path)
self._images_dir = RoboflowDirPathExtracter.find_images_dir(self._rootpath, subset)
elif coco_importer_type == CocoImporterType.mmdet:
self._rootpath = MmdetDirPathExtracter.find_rootpath(path)
self._images_dir = MmdetDirPathExtracter.find_images_dir(self._rootpath, subset)
else:
raise DatasetImportError(f"Not supported type: {coco_importer_type}")

Expand Down
1 change: 1 addition & 0 deletions src/datumaro/plugins/data_formats/coco/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ class CocoTask(Enum):
class CocoImporterType(Enum):
default = auto()
roboflow = auto()
mmdet = auto()


class CocoPath:
Expand Down
79 changes: 79 additions & 0 deletions src/datumaro/plugins/data_formats/mmdet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

import os.path as osp
from glob import glob
from typing import Optional

from datumaro.components.dataset_base import DEFAULT_SUBSET_NAME
from datumaro.components.format_detection import FormatDetectionConfidence, FormatDetectionContext
from datumaro.components.importer import ImportContext
from datumaro.plugins.data_formats.coco.base import _CocoBase
from datumaro.plugins.data_formats.coco.format import CocoImporterType, CocoTask
from datumaro.plugins.data_formats.coco.importer import CocoImporter


class MmdetCocoImporter(CocoImporter):
@classmethod
def detect(
cls,
context: FormatDetectionContext,
) -> FormatDetectionConfidence:
ann_paths = context.require_files("annotations/instances_*.json")

for ann_path in ann_paths:
subset_name = cls._get_subset_name(ann_path)

with context.require_any():
with context.alternative():
image_files = osp.join(subset_name, "*.jpg")
context.require_file(f"{image_files}")

return FormatDetectionConfidence.MEDIUM

def __call__(self, path, stream: bool = False, **extra_params):
subset_paths = glob(osp.join(path, "**", "instances_*.json"), recursive=True)

sources = []
for subset_path in subset_paths:
options = dict(extra_params)
options["subset"] = self._get_subset_name(subset_path)

if stream:
options["stream"] = True

Check warning on line 44 in src/datumaro/plugins/data_formats/mmdet.py

View check run for this annotation

Codecov / codecov/patch

src/datumaro/plugins/data_formats/mmdet.py#L44

Added line #L44 was not covered by tests

sources.append({"url": subset_path, "format": "mmdet_coco", "options": options})

return sources

@classmethod
def _get_subset_name(cls, subset_path: str):
parts = osp.splitext(osp.basename(subset_path))[0].split("instances_", maxsplit=1)
subset_name = parts[1] if len(parts) == 2 else DEFAULT_SUBSET_NAME

return subset_name


class MmdetCocoBase(_CocoBase):
"""
Parses Roboflow COCO annotations written in the following format:
https://cocodataset.org/#format-data
"""

def __init__(
self,
path,
*,
subset: Optional[str] = None,
stream: bool = False,
ctx: Optional[ImportContext] = None,
):
super().__init__(
path,
task=CocoTask.instances,
coco_importer_type=CocoImporterType.mmdet,
subset=subset,
stream=stream,
ctx=ctx,
)
12 changes: 12 additions & 0 deletions src/datumaro/plugins/specs.json
Original file line number Diff line number Diff line change
Expand Up @@ -710,6 +710,18 @@
"plugin_type": "Importer",
"extra_deps": []
},
{
"import_path": "datumaro.plugins.data_formats.mmdet.MmdetCocoBase",
"plugin_name": "mmdet_coco",
"plugin_type": "DatasetBase",
"extra_deps": []
},
{
"import_path": "datumaro.plugins.data_formats.mmdet.MmdetCocoImporter",
"plugin_name": "mmdet_coco",
"plugin_type": "Importer",
"extra_deps": []
},
{
"import_path": "datumaro.plugins.data_formats.mnist.MnistBase",
"plugin_name": "mnist",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"licenses":[
{
"name":"",
"id":0,
"url":""
}
],
"info":{
"contributor":"",
"date_created":"",
"description":"",
"url":"",
"version":"",
"year":""
},
"categories":[
{
"id":1,
"name":"a",
"supercategory":""
},
{
"id":2,
"name":"b",
"supercategory":""
},
{
"id":4,
"name":"c",
"supercategory":""
}
],
"images":[
{
"id":5,
"width":10,
"height":5,
"file_name":"a.jpg",
"license":0,
"flickr_url":"",
"coco_url":"",
"date_captured":0
}
],
"annotations":[
{
"id":1,
"image_id":5,
"category_id":2,
"segmentation":[

],
"area":3.0,
"bbox":[
2.0,
2.0,
3.0,
1.0
],
"iscrowd":0
}
]
}
101 changes: 101 additions & 0 deletions tests/assets/coco_dataset/mmdet_coco/annotations/instances_val.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
{
"licenses":[
{
"name":"",
"id":0,
"url":""
}
],
"info":{
"contributor":"",
"date_created":"",
"description":"",
"url":"",
"version":"",
"year":""
},
"categories":[
{
"id":1,
"name":"a",
"supercategory":""
},
{
"id":2,
"name":"b",
"supercategory":""
},
{
"id":4,
"name":"c",
"supercategory":""
}
],
"images":[
{
"id":40,
"width":5,
"height":10,
"file_name":"b.jpg",
"license":0,
"flickr_url":"",
"coco_url":"",
"date_captured":0
}
],
"annotations":[
{
"id":1,
"image_id":40,
"category_id":1,
"segmentation":[
[
0.0,
0.0,
1.0,
0.0,
1.0,
2.0,
0.0,
2.0
]
],
"area":2.0,
"bbox":[
0.0,
0.0,
1.0,
2.0
],
"iscrowd":0,
"attributes":{
"x":1,
"y":"hello"
}
},
{
"id":2,
"image_id":40,
"category_id":2,
"segmentation":{
"counts":[
0,
20,
30
],
"size":[
10,
5
]
},
"area":20.0,
"bbox":[
0.0,
0.0,
1.0,
9.0
],
"iscrowd":1
}
]
}
Binary file added tests/assets/coco_dataset/mmdet_coco/train/a.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added tests/assets/coco_dataset/mmdet_coco/val/b.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading