COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e., car polishing, make French fries) related to 12 domains (i.e., vehicle, dish). All videos are collected from YouTube and annotated with an efficient toolbox.
Yansong Tang*, Dajun Ding†, Yongming Rao*, Yu Zheng*, Danyang Zhang*, Lili Zhao†, Jiwen Lu*, Jie Zhou*, Yongxiang Lian*, Yao Li†, Jiali Sun†, Chang Liu†, Dongge You†, Zirun Yang†, Jiaojiao Ge†, Jiayun Wang*
- *Tsinghua University
- †Meitu Inc.
Contact: coin.dataset@gmail.com
You may use the codes and files for research only, including sharing and modifying the material. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
The COIN is organized in a hierarchical structure, which contains three levels: domain
, task
and step
. The corresponding relationship can be found at taxonomy [link]. We provide the taxonomy file of COIN in csv format. Below, we show a small part of the texonomy stored in taxonomy.xlsx
:
domain_target_mapping | target_action_mapping | ||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
We store the url of video and their annotation in JSON format, which can be accessed with the link [COIN](Project link page). The json file is similar to that of ActivityNet. Below, we show an example entry from the key field "database":
"LtRSn-ntcLY": {
"duration": 131.0309,
"class": "ReplaceCDDriveWithSSD",
"video_url": "https://www.youtube.com/embed/LtRSn-ntcLY",
"start": 56.640895694775196,
"annotation": [
{
"id": "212",
"segment": [
60.0,
69.0
],
"label": "take out the laptop CD drive"
},
{
"id": "216",
"segment": [
71.0,
82.0
],
"label": "insert the hard disk tray into the position of the CD drive"
}
],
"subset": "training",
"end": 85.714362947023,
"recipe_type": 131
}
From the entry, we can easily retrieve the Youtube ID, duration, ROI and procedure information of the video. The field "annotation" comprises of a list of all annotated procedures within the video. The field "class" and sub-field "id" correspond to "task" and "step" of the taxonomy respectively.
The annotation information is saved in COIN.json
.
Field Name | Type | Example | Description |
---|---|---|---|
database |
string | - | Key filed of the annotation file. |
- | string | LtRSn-ntcLY |
Youtube ID of the video. |
duration |
float | 56.640895694775196 | Duration of the video in seconds. |
class |
string | ReplaceCDDriveWithSSD |
Name of the task in the video. |
video_url |
string | https://www.youtube.com/embed/LtRSn-ntcLY |
Url of the video. |
start |
float | 56.640895694775196 | Start time of the ROI of the video. |
end |
float | 85.714362947023 | End time of the ROI of the video. |
subset |
string | training or validation |
Subset of the video. |
recipe_type |
int | 131 | ID number of the task. |
annotation |
string | - | Annotation information of the video. |
annotation :id |
int | 212 | ID number of the procedure. |
annotation :label |
string | take out the laptop CD drive |
Name of the procedure. |
annotation :segment |
list of float (len=2) | [60.0,69.0] |
Start and end time of the procedure. |