Reproduce experiments

This repository contains the code of the paper:
Span-based discontinuous constituency parsing: a family of exact chart-based algorithms with time complexities from O(n^6) down to O(n^3)
Caio Corro
EMNLP 2020

The cpp directory contains the parser itself.
You must compile and install it before running the python program:
python3 setup.py install --user

There is one script for training and one for prediction.

The pydestruct directory contains a lot of code unrelated with this parser, it is a collection of python file I use.
Not all code in this directory was written by myself, I tried to put link to other repos whenever I stolled it. The file proper.prm is the file distributed with discodop.

Reproduce experiments

You need to install to following software/libraries:

Python 3
Discodop (the discodop command line tool must be in the path - it is used for evaluation)
Pytorch
the Huggingface transformers library (I use version 3.0.1, I am not sure it will work with newer versions)

Download glove embeddings (dim=300) and put them in an embeddings directory with name glove_english.txt and glove_german.txt. Next, put the proprocessed data in the data directory, as follows:

$ tree data  
data  
├── dptb  
│   ├── dev.export  
│   ├── test.export  
│   └── train.export  
├── negra  
│   ├── dev.export  
│   ├── test.export  
│   ├── train.export  
└── tiger_spmrl 
    ├── dev.export  
    ├── test.export  
    └── train.export

The experiments directory contains scripts called cmd that you can execute to train and evaluate models in different settings. If you use slurm you can execute display_results.sh that will submit all cmd scripts via sbatch. After training is done, you can use display_results.sh to have a summary of results on test data. I use nvidia V100 GPUs with 32GO of ram. You will need a lot of CPU ram too (90 Go I think).

WARNING

The inside-outside implementation does not work.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
cpp		cpp
data		data
disc_span_parser		disc_span_parser
experiments		experiments
pydestruct		pydestruct
.gitignore		.gitignore
.train_disc_biaffine.py.swp		.train_disc_biaffine.py.swp
LICENSE		LICENSE
README.md		README.md
config.py		config.py
pred_disc_biaffine.py		pred_disc_biaffine.py
proper.prm		proper.prm
setup.py		setup.py
train_disc_biaffine.py		train_disc_biaffine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproduce experiments

WARNING

About

Releases

Packages

Languages

License

FilippoC/disc-span-parser-release

Folders and files

Latest commit

History

Repository files navigation

Reproduce experiments

WARNING

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages