CLC-QuAD[1] is the first large scale complex Chinese semantic parsing dataset over Wikidata, which consists of multi-hop questions, dualintent question, boolean questions and counting questions. This dataset is proposed to push forward Chinese KGQA research. In this dataset, questions are generated by translating all effective English questions in the LC-QuAD 2.0 dataset into Chinese. CLC-QuAD contains 28k+ pairs of question and SPARQL query in total, which is comparable to or bigger than most commonly used KBQA datasets.
[1] Zou, Jianyun, Min Yang, Lichao Zhang, Yechen Xu, Qifan Pan, Fengqing Jiang, Ran Qin et al. A Chinese Multi-type Complex Questions Answering Dataset over Wikidata. arXiv preprint arXiv:2111.06086 (2021).