Skip to content

Latest commit

 

History

History
31 lines (18 loc) · 2.48 KB

ProjectHome.md

File metadata and controls

31 lines (18 loc) · 2.48 KB

The Sports-1M Dataset

http://cs.stanford.edu/people/karpathy/deepvideo/sz70sx.jpg

Academic dataset that accompanies the paper "Large-scale Video Classification with Convolutional Neural Networks." (Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei).

This dataset contains 1,133,158 video URLs which have been annotated automatically with 487 labels. The annotation was done via the YouTube Topics API. If you wish to use the YouTube API yourself, we have provided the topic ID for each of the classes here. Example thumbnails for each class can be found here.

Getting the Data

If you'd like to download the necessary files for this benchmark, run the following command:

git clone https://github.com/gtoderici/sports-1m-dataset.git

Once you have downloaded the data, please read the README file in order to see how to use this data.

Text File Format

The 487 labels file is encoded in UTF-8. Therefore, in order to be able to read all the label names correctly you need to make sure that you are using an UTF-8 compatible text editor.

Freebase Machine IDs/YouTube Topic ID#

The sports_mid.txt file lists the machine ID followed by a comma and the class ID (matching the 487 labels).

Notes

There is a discussion group about this dataset. Please post specific questions there, and group members may be able to help.

Related Datasets

Google Research has announced the availability of the YouTube-8M dataset. Sports-1M is included within this new dataset, with frame-level features already extracted. If computational complexity is a concern (i.e., it is not feasible to extract features from Sports-1M), we highly recommend this new dataset, since it already provides the features.