Unity Dataset Insights is a python package for downloading, parsing and analyzing synthetic datasets generated using the Unity Perception package.
Datasetinsights is published to PyPI. You can simply run pip install datasetinsights
command under a supported python environments:
We provide a sample notebook to help you load synthetic datasets generated using Perception package and visualize dataset statistics. We plan to support other sample Unity projects in the future.
The Unity Perception package provides datasets under this schema. The datasetinsighs package also provide convenient python modules to parse datasets.
For example, you can load AnnotationDefinitions
into a python dictionary by providing the corresponding annotation definition ID:
from datasetinsights.datasets.unity_perception import AnnotationDefinitions
annotation_def = AnnotationDefinitions(data_root=dest, version="my_schema_version")
definition_dict = annotation_def.get_definition(def_id="my_definition_id")
Similarly, for MetricDefinitions
:
from datasetinsights.datasets.unity_perception import MetricDefinitions
metric_def = MetricDefinitions(data_root=dest, version="my_schema_version")
definition_dict = metric_def.get_definition(def_id="my_definition_id")
The Captures
table provide the collection of simulation captures and annotations. You can load these records directly as a Pandas DataFrame
:
from datasetinsights.datasets.unity_perception import Captures
captures = Captures(data_root=dest, version="my_schema_version")
captures_df = captures.filter(def_id="my_definition_id")
The Metrics
table can store simulation metrics for a capture or annotation. You can also load these records as a Pandas DataFrame
:
from datasetinsights.datasets.unity_perception import Metrics
metrics = Metrics(data_root=dest, version="my_schema_version")
metrics_df = metrics.filter_metrics(def_id="my_definition_id")
You can download the datasets using the download command:
datasetinsights download --source-uri=<xxx> --output=$HOME/data
The download command supports HTTP(s), and GCS.
Alternatively, you can download dataset directly from python interface.
GCSDatasetDownloader
can download a dataset from GCS locations.
from datasetinsights.io.downloader import GCSDatasetDownloader
source_uri=gs://url/to/file.zip # or gs://url/to/folder
dest = "~/data"
downloader = GCSDatasetDownloader()
downloader.download(source_uri=source_uri, output=dest)
HTTPDatasetDownloader
can a dataset from any HTTP(S) url.
from datasetinsights.io.downloader import HTTPDatasetDownloader
source_uri=http://url.to.file.zip
dest = "~/data"
downloader = HTTPDatasetDownloader()
downloader.download(source_uri=source_uri, output=dest)
If you are interested in converting the synthetic dataset to COCO format for
annotations that COCO supports, you can run the convert
command:
datasetinsights convert -i <input-directory> -o <output-directory> -f COCO-Instances
or
datasetinsights convert -i <input-directory> -o <output-directory> -f COCO-Keypoints
You will need to provide 2D bounding box definition ID in the synthetic dataset. We currently only support 2D bounding box and human keypoint annotations for COCO format.
You can use the pre-build docker image unitytechnologies/datasetinsights to interact with datasets.
You can find the API documentation on readthedocs.
Please let us know if you encounter a bug by filing an issue. To learn more about making a contribution to Dataset Insights, please see our Contribution page.
Dataset Insights is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
If you find this package useful, consider citing it using:
@misc{datasetinsights2020,
title={Unity {D}ataset {I}nsights Package},
author={{Unity Technologies}},
howpublished={\url{https://github.com/Unity-Technologies/datasetinsights}},
year={2020}
}