BigSurvey

A large-scale dataset for numerous academic papers summarization

Paper link

Download

Download from Google Drive link

License

BigSurvey is licensed under ODC-BY.

When using the BigSurvey dataset in a product or service, or including data in a redistribution, please cite the following paper:

@inproceedings{ijcai2022p591,
  title     = {Generating a Structured Summary of Numerous Academic Papers: Dataset and Method},
  author    = {LIU, Shuaiqi and Cao, Jiannong and Yang, Ruosong and Wen, Zhiyuan},
  booktitle = {Proceedings of the Thirty-First International Joint Conference on
               Artificial Intelligence, {IJCAI-22}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Lud De Raedt},
  pages     = {4259--4265},
  year      = {2022},
  month     = {7},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2022/591},
  url       = {https://doi.org/10.24963/ijcai.2022/591},
}

FAQ

Q1: Does each line of src.txt and tgt.txt files contain the source documents and the target summary, respectively?

A1: Yes

Q2: What is the "story_separator_special_tag" in the src.txt files

A2: "story_separator_special_tag" separates the input documents in each example.

Q3: Where do the abstracts of the reference papers come from?

A3: We collect them from Microsoft Academic Service and Semantic Scholar.

Q4: Why truncate the input documents?

A4: We truncated the reference papers' abstracts to 200 words because the data sources (Microsoft Academic Service and Semantic Scholar, especially the first one) cannot split the abstract and introduction well. In their APIs' responses, some "abs" can be longer than tens of thousands of words. We think the truncation is necessary. Otherwise, the longest reference abstracts will occupy the major content of the inputs, which is unfair to other reference papers. Most papers' abstracts are within 200-300 words.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigSurvey

Download

License

FAQ

About

Releases

Packages

StevenLau6/BigSurvey

Folders and files

Latest commit

History

Repository files navigation

BigSurvey

Download

License

FAQ

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages