We provide a framework to evaluate cross-lingual sentence embeddings. Important features are:
- Clean software architecture: components are (mostly) decoupled
- Random subsample validation to mitigate effects from randomness (e.g. random initialization)
- Easily extensible (new models, new datasets, new tasks, ...)
- Automated execution of different configurations with the generation of excel reports
We will provide additional sample configurations soon when we released the data for the cross-lingual tasks.
- Python 2.7
- tensorflow (CPU is fine, we used TF 1.2)
- Run
pip install -r requirements.txt
to install all required python packages
You can either run one individual experiment or many subsample runs over multiple configurations and different tasks. The latter also includes optimization of the learning rate, which is an important parameter that can have a noticeable influence on the results.
For each experiment you need a configuration file that defines important parameters and specifies which modules should
be loaded. An example is given in configs/individual/AC_fr.yaml
for the execution of a single cross-lingual (en->fr)
experiment on the AC dataset. See this configuration file for further documentation.
You can run this experiment with: python run_single.py configs/infividual/AC_fr.yaml
.
In our paper we used random subsample validation over many repeated runs and optimize the learning rate. We performed this for a number of different sentence embeddings on a number of different datasets. This makes a lot of different experiments. Of course, we automated most of it.
The script run_multiple_optim_lr.py
will automatically run many different configurations on different tasks, and at
the same time optimize the learning rate for each individual configuration. After finishing each task, an excel report
will be written, which summarizes the model performances.
For this mode, we need a separate configuration file that specifies all tasks and runs. See configs/multiple.yaml
for
an example. This file specifies two tasks with different runs and many repetitions for random subsample validation.
Importantly, this configuration relies on templates that specify general task configurations. An example is the file
configs/subsample_templates/AC.yaml
. This file specifies general parameters but uses template strings for parameters
that are specific to a run, e.g. the type of word embeddings that should be concatenated (the model).
You can run this experiment with: python run_multiple_optim_lr.py configs/multiple.yaml
. The excel report will be
written to the current folder.
Our software can be extended with new models, new datasets, new training procedures, or new evaluation
scripts: Just implement the interfaces specified in experiment/__init__.py
in a separate file and create a variable
named "component" in the same file that points to the component class. You can now use this component in your
configuration files (e.g. as model-module).
To create a new module you can also just copy an existing module and replace its implementation. E.g. copy
experiment.sentence_classification.model.linear
and implement an MLP instead.