Skip to content

A code for "Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking" in ACL2019

License

Notifications You must be signed in to change notification settings

misonuma/strsum

Repository files navigation

Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking

Corresponding paper:

https://arxiv.org/abs/1906.05691
Masaru Isonuma, Juncihiro Mori and Ichiro Sakata (The University of Tokyo)
Accepted in ACL 2019 as a long paper

Requirements

Python 3.6
Tensorflow 1.8.0

Usage

Preprocessing

The pre-processed Amazon Sports & Outdoors review data can be downloaded at:
Put it at /path/to/data

To use other dataset, run:

python create_dataframe.py --input_path /path/to/gzip/file --output_path /path/to/dataframe
python preprocess_data.py --input_path /path/to/dataframe --output_path /path/to/data --word_vec_path /path/to/vec

Other dataset can be downloaded at http://jmcauley.ucsd.edu/data/amazon/
Put raw gzip file at /path/to/gzip/file
Also, download the pretrained FastText word vector file crawl-300d-2M.vec from the url below and put it at /path/to/vec
https://fasttext.cc/docs/en/english-vectors.html

Training

To train your model, run:

python cli.py --mode train --data /path/to/data --modeldir /dir/of/model

Parameters and logs are saved in /path/to/a/model/directory
The hyper parameters can be chaneged as described in cli.py

Evaluation

To write out and evaluate the summaries generated by your model, run:

python cli.py --mode eval --data /path/to/data --modeldir /dir/of/model --refdir /dir/of/refs --outdir /dir/of/outputs

References and output summaries are respectively saved in /dir/of/refs and /dir/of/outputs
The same hyper parameters must be set as used for training

About

A code for "Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking" in ACL2019

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages