Skip to content

hekastos/OneStopEnglishCorpus

 
 

Repository files navigation

This repository hosts the dataset described in the following paper:

OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
Sowmya Vajjala and Ivana Lučić
2018
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 297–304. Association for Computational Linguistics.
url. bib file

Please cite the above paper if you use this corpus in your research.

DOI

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Description of this repo:

  • Texts-SeparatedByReadingLevel/: This is the actual corpus folder, containing three sub-folders, one per reading level. Each file has the same name followed by a -ele.txt/-int.txt/-adv.txt depending on the sub-folder it is in.
  • Texts-Together-OneCSVperFile/: This folder has one csv file per text, three columns for three reading levels. Paragraph breaks are preserved.
  • Sentence-Aligned/: This folder contains three text files, with pair-wise sentence alignments (adv-int, int-ele, adv-ele). Cosine similarity was used to align sentences.
  • Processed-AllLevels-AllFiles/ : folder contains sub-folders with output files from Stanford parser, Stanford CoreNLP, and Upenn's Discourse Connectives Tagger

For enquiries: contact: sowmya@iastate.edu

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published