The system for determining semantic textual similarity by combining shallow features with features with features extracted from natural deduction proofs of bidirectional entailment relations between sentence pairs
- In order to run this system, you need to checkout a different branch at first:
git checkout emnlp2017_sts
- Ensure that you have downloaded C&C parser
and EasyCCG parser and wrote their installation locations
in the files
en/parser_location.txt
.
cat en/parser_location.txt
candc:/home/usr/software/candc/candc-1.00
easyccg:/home/usr/software/easyccg
- You need to download some python modules, the SICK dataset by running the following script:
./en/download_dependencies.sh
pip install -r requirements.txt
- Also, you need to download pretrained vector space models from Here.
After that, unzip the
models.zip
file and put thismodels
directory into theen
directory.
You can evaluate the end-to-end system performance of a certain list of semantic templates on the test split of SICK by doing:
./en/emnlp2017exp.sh 3 en/semantic_templates_en_event_sts.yaml
You can also evaluate the system performance with MSR-video dataset by doing:
./en/emnlp2017exp_msr.sh 3 en/semantic_templates_en_event_sts.yaml
System output is shown below:
features_np.pickle(extracted features from ccg2lambda)
randomforestregressor.pkl(trained model)
results/evaluation.txt(correlation evaluation)
results/error_result.txt(error predictions (diff > 0.75))
results/all_result.txt(all the predictions)
results/result.png(regression line)