Skip to content
/ CEDAC Public

Chinese-English Conversation Dataset based on subtitles of TV plays

Notifications You must be signed in to change notification settings

zhangxt/CEDAC

Repository files navigation

What is CEDAC?

Chinese English DAily Conversation is a subtitle corpus with speakerID and sceneID labels made by Center for Speech and Language Technologies of Tsinghua University. At present,we open source partial data and we will continuously update more. The paper with detail has been published on CCL-2019. If it is helpful to your research, please indicate the citation of the paper. Welcome star the repository, thanks.

这是一个中英双语带有说话者ID和说话场景ID的对话数据集,有近百万的对话。 目前先开源部分数据,我们会持续更新,欢迎关注和标星。 该研究论文《自动构建基于电视剧字幕和剧本的日常会话基础标注库》已经在CCL-2019会议上发表。 关于数据集的详细信息可以阅读论文。如果对您的研究有帮助,请注明论文引用。

About

Chinese-English Conversation Dataset based on subtitles of TV plays

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published