Name		Name	Last commit message	Last commit date
parent directory ..
rstfinder		rstfinder
utils		utils
README.md		README.md
create_data.sh		create_data.sh

README.md

Data

We used the Blogs Authorship Corpus, IMDb62, Amazon 5-core Reviews datasets to construct our benchmark. The description of the how the data was selected, processed, and split into Train, Dev, and Test splits is described in our paper. The data we used is available in this release.

Benchmark Construction:

Reproducing our benchmark can be obtained by using the create_data.sh script, which encapsulates the following steps:

Downloading each dataset separately and applying a preprocessing step if needed.
Annotating the data examples with fine-grained linguistic features by using the utils/annotate_data.py script.
Obtaining the RST relations by using rstfinder. This can be done by 1) writing every data example to a separate file and 2) invoking the rstfinder/parse_data.sh script.
Spliting and descritizing the data to create Train, Dev, and Test splits using the utils/split_and_discretize.py script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

Data

Benchmark Construction:

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Data

Benchmark Construction: