Unsupervised Neural Single-Document Summarization of Reviews via Learning Latent Discourse Structure and its Ranking
Corresponding paper:
https://arxiv.org/abs/1906.05691
Masaru Isonuma, Juncihiro Mori and Ichiro Sakata (The University of Tokyo)
Accepted in ACL 2019 as a long paper
Python 3.6
Tensorflow 1.8.0
The pre-processed Amazon Sports & Outdoors review data can be downloaded at:
Put it at /path/to/data
To use other dataset, run:
python create_dataframe.py --input_path /path/to/gzip/file --output_path /path/to/dataframe
python preprocess_data.py --input_path /path/to/dataframe --output_path /path/to/data --word_vec_path /path/to/vec
Other dataset can be downloaded at http://jmcauley.ucsd.edu/data/amazon/
Put raw gzip file at /path/to/gzip/file
Also, download the pretrained FastText word vector file crawl-300d-2M.vec
from the url below and put it at /path/to/vec
https://fasttext.cc/docs/en/english-vectors.html
To train your model, run:
python cli.py --mode train --data /path/to/data --modeldir /dir/of/model
Parameters and logs are saved in /path/to/a/model/directory
The hyper parameters can be chaneged as described in cli.py
To write out and evaluate the summaries generated by your model, run:
python cli.py --mode eval --data /path/to/data --modeldir /dir/of/model --refdir /dir/of/refs --outdir /dir/of/outputs
References and output summaries are respectively saved in /dir/of/refs
and /dir/of/outputs
The same hyper parameters must be set as used for training