Skip to content

Download and visualize single or multiple classes from the huge Open Images v4 dataset

License

Notifications You must be signed in to change notification settings

mastercam123/OIDv4_ToolKit

 
 

Repository files navigation

Forked repository and added conversion python script

My added script is: convert_annotations.py

Use toolkit normally to gather images from open images dataset. After gathering images just run from root directory:

python convert_annotations.py

This will generate .txt annotation files in proper format for custom object detection with YOLOv3. The text files are generated in folder with images.

~ OIDv4 ToolKit ~

Do you want to build your personal object detector but you don't have enough images to train your model? Do you want to train your personal image classifier, but you are tired of the deadly slowness of ImageNet? Have you already discovered Open Images Dataset v4 that has 600 classes and more than 1,700,000 images with related bounding boxes ready to use? Do you want to exploit it for your projects but you don't want to download gigabytes and gigabytes of data!?

With this repository we can help you to get the best of this dataset with less effort as possible. In particular, with this practical ToolKit written in Python3 we give you, for both object detection and image classification tasks, the following options:

(2.0) Object Detection

  • download any of the 600 classes of the dataset individually, taking care of creating the related bounding boxes for each downloaded image
  • download multiple classes at the same time creating separated folder and bounding boxes for each of them
  • download multiple classes and creating a common folder for all of them with a unique annotation file of each image
  • download a single class or multiple classes with the desired attributes
  • use the practical visualizer to inspect the donwloaded classes

(3.0) Image Classification

  • download any of the 19,794 classes in a common labeled folder
  • exploit tens of possible commands to select only the desired images (ex. like only test images)

The code is quite documented and designed to be easy to extend and improve. Me and Angelo are pleased if our little bit of code can help you with your project and research. Enjoy ;)

Snippet of the OIDv4 available classes

Open Image Dataset v4

All the information related to this huge dataset can be found here. In these few lines are simply summarized some statistics and important tips.

Object Detection

TrainValidationTest#Classes
Images1,743,04241,620 125,436-
Boxes14,610,229204,621625,282600

Image Classification

TrainValidationTest#Classes
Images9,011,21941,620125,436-
Machine-Generated Labels78,977,695512,0931,545,8357,870
Human-Verified Labels27,894,289551,3901,667,39919,794

As it's possible to observe from the previous table we can have access to images from free different groups: train, validation and test. The ToolKit provides a way to select only a specific group where to search. Regarding object detection, it's important to underline that some annotations has been done as a group. It means that a single bounding box groups more than one istance. As mentioned by the creator of the dataset:

  • IsGroupOf: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching. That's again an option of the ToolKit that can be used to only grasp the desired images.

Finally, it's interesting to notice that not all annotations has been produced by humans, but the creator also exploited an enhanced version of the method shown here reported 1

1.0 Getting Started