Skip to content

storidient/KoCoNovel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KoCoNovel

KoCoNovel is a character coreference dataset based on 50 modern and contemporary Korean novels. For more details, refer to our paper, "KoCoNovel: Annotated Dataset of Character Coreference in Korean Novels".

Corpus

The corpus is drawn from the public domain texts on Wikisource. The preprocessing involved correcting typos and incorrect line breaks within the text, and adjusting the spelling to match modern Korean grammar. The list of novels can be found in the file list_of_novels.csv.

Data and Annotation

KoCoNovel contains grammatically corrected versions of novels and annotations for character coreference in four types of options, along with speaker annotations for all direct quotations. The types of options are as follows:

  • [Reader/Omniscient] From the perspective of the omniscient author or the readers
  • [Separate/Overlapped] How multiple entities are treated either as separate entities (e.g., [‘We’], [‘I’], [‘You’]) or as overlapped entities (e.g., [‘We’, ‘I’], [‘We’, ‘You’])

Please cite the following work if you use this dataset:

@misc{kim2024koconovel,
      title={KoCoNovel: Annotated Dataset of Character Coreference in Korean Novels}, 
      author={Kyuhee Kim and Surin Lee and Sangah Lee},
      year={2024},
      eprint={2404.01140},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages