This repository contains the necessary data and code to run our NER experiments.
Before you start, do the following:
- Get the following data files
- MasakhaNER:
data/masakhaner/*/{train,dev,test}.txt
- Finnish:
data/turku-fin-ner/{train,dev,test}.txt
- Hindi:
data/hiner/collapsed/{train,dev,test}.json
- MasakhaNER:
- Put the ParaNames TSV files in a folder called
paranames
in the root of the repo- Tip: symlinks will work, too
- Run
bash setup.sh
which will set up a Conda environment and attempts to install DyNet- NOTE: Installing DyNet may require manual intervention
The main workhorse is full_experiment.sh
which you run with
bash ./full_experiment.sh "${config_file_path}" "${language}" "${should_confirm}"
where
config_file_path
is a path to the configuration file for the experimentlanguage
is the relevant language code for the experimental datashould_confirm
: a boolean (yes
/no
) for interactively confirming commands.- if
yes
, an interactivefzf
menu will be used to select tasks to run
- if
- African languages: MasakhaNER
- Finnish: Turku NLP corpus
- Hindi: HiNER