TikTalk is a multi-modal Chinese dialogue dataset introduced in TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World. Our dataset homepage is here. Our paper has been accepted by ACM Multimedia 2023 (Oral).
Examples of dialogues from TikTalk dataset.
The comparison of main multi-modal dialogue datasets and their characteristics are present in the table below.
We provide a way to download our data. You can use src/spider.py and video ids to get the url for each video and use src/download_videos.py to download the videos.
You need to ensure that the data are used only for research purposes and are not redistributed to any third party. If the data are reproduced in electronic or print media, they may only be used in scientific journals with a copyright notice. Please fill in the application form and send it to aimindtiktalk@163.com and we will provide the download links the video ids and corresponding dialogue data.
The license of the collected dataset is here.