The original dataset is hosted on Networks-Learning/spaced-selection.
This repository contains the code to extract reviews from the dataset, group them by user, and save them to parquet files.
The parquet files can be used for srs-benchmark.
- Download the dataset from here.
- Unzip the file and move the
stats-20191220-20200731
folder to the root of this repository. - Run
python group_user_reviews.py
to group reviews by user and save them to csv files. - Run
python build_parquet.py
to build parquet files.