Skip to content

Latest commit

 

History

History
160 lines (128 loc) · 5.88 KB

README.md

File metadata and controls

160 lines (128 loc) · 5.88 KB

morphoeval - Evaluation for morphological analysis and segmentation

Introduction

This package provides re-implementations for the BPR, CoMMA, and EMMA-2 evaluation methods for unsupervised morphological analysis and segmentation introduced by Virpioja et al. (2011).

The BPR (boundary precision and recall) method calculates a macro-average of the segmentation boundary matches over the words and is thus suitable for evaluating unsupervised or supervised morphological segmentation.

The CoMMA and EMMA-2 methods are designed for the task of unsupervised morphological analysis that was the goal in the Morpho Challenge competitions organized between 2005 and 2010 (see Kurimo et al., 2010). The challenge in the evaluation of unsupervised morphological analysis is that the predicted morphemes labels and the labels in gold standard analysis are not directly comparable, as an unsupervised algorithm does not see the gold standard labels, and in contrast to unsupervised morphological segmentation, the labels are not simply subsequences of words.

Both methods start with a bipartite morpheme-word graph that collects the occurrences of the morphemes in the word forms within the test set. The CoMMA methods first use the morpheme-word graphs to create word graphs, with edges as co-occurring morphemes, for the predicted and gold standard analyses, and calculates the precision and recall of the edges. The EMMA (Spiegler and Monson, 2010) and EMMA-2 methods use the morpheme-word graph to make one-to-one or one-to-many assignments between the predicted and gold standard morphemes, and calculates the precision and recall based on the mapped morphemes.

The choice of method

The choice of the evaluation method should depend on the task at hand (segmentation or analysis) and whether the method produces (and the gold standard includes) multiple alternative analyses per word. Here are our recommendations; see Virpioja et al. (2011) for further discussion.

Task Single analysis per word Multiple analyses per word
Morphological segmentation BPR BPR-S
Morphological analysis EMMA-2, CoMMA-B0 EMMA-2, CoMMA-S0

Instructions

Installation

Install the latest release from PyPI:

  • pip install morphoeval

Install from source:

  • pip install .

Usage

Installing the package provides a single command, morphoeval:

$ morphoeval --help
usage: morphoeval [-h] [--metric {comma-b0,comma-b1,comma-s0,comma-s1,emma-2,bpr,bpr-s}] [--verbose]
                     goldfile predfile [output]

Metrics for morphological analysis and segmentation

positional arguments:
  goldfile              gold standard analysis file
  predfile              predicted analysis file
  output                output file

optional arguments:
  -h, --help            show this help message and exit
  --metric {comma-b0,comma-b1,comma-s0,comma-s1,emma-2,bpr,bpr-s}, -m {comma-b0,comma-b1,comma-s0,comma-s1,emma-2,bpr,bpr-s}
                        metric (default comma-b0)
  --beta FLOAT          beta for using F_beta score
  --verbose, -v         increase verbosity

The parameters are simple enough: Use --metric to select the evaluation method that you want to use, and provide the gold standard and predicted analysis files.

The input files should be in the format used in Morpho Challenges: The word and its analyses are separated by a tabular character, any alternative analyses by a comma and a space, and the labels of the analyses by single space. For example:

brush	brush_N
brushes	brush_N +3SG, brush_N +PL

The output is written in YAML format:

files: {predictions: pred.txt, reference: gold.txt}
metric: emma-2
scores: {f-score: 0.9251, precision: 0.8939, recall: 0.9585}

Note: For large (>10k words) input files, running the evaluation may take a considerable amount of memory.

Original scripts

The original scripts are available at http://morpho.aalto.fi/events/morphochallenge/, but do not work with modern Python versions. The current implementation has the following limitations compared to the previous scripts:

  • The original EMMA algorithm with one-to-one mapping between morphemes is not supported.
  • Weighting of each input word is not supported.

References

References as BibTeX:

% Kurimo et al. (2010)
@inproceedings{kurimo-et-al-2010-morpho,
    address = {Uppsala, Sweden},
    author = {Mikko Kurimo and Sami Virpioja and Ville Turunen and Krista Lagus},
    booktitle = {Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology},
    month = {July},
    pages = {87--95},
    publisher = {Association for Computational Linguistics},
    title = {Morpho Challenge 2005-2010: Evaluations and Results},
    url = {https://aclanthology.org/W10-2211},
    year = {2010},
}

% Spiegler and Monson (2010)
@inproceedings{spiegler-monson-2010-emma,
    address = {Beijing, China},
    author = {Sebastian Spiegler and Christian Monson},
    booktitle = {Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)},
    month = {August},
    pages = {1029--1037},
    publisher = {Coling 2010 Organizing Committee},
    title = {{EMMA}: A novel Evaluation Metric for Morphological Analysis},
    url = {https://aclanthology.org/C10-1116},
    year = {2010},
}

% Virpioja et al. (2011)
@article{virpioja-et-al-2011-empirical,
    author = {Sami Virpioja and Ville T. Turunen and Sebastian Spiegler and Oskar Kohonen and Mikko Kurimo},
    journal = {Traitement Automatique des Langues},
    number = {2},
    pages = {45--90},
    publisher = {ATALA},
    title = {Empirical Comparison of Evaluation Methods for Unsupervised Learning of Morphology},
    url = {https://www.atala.org/sites/default/files/TAL_52_2_2.pdf},
    volume = {52},
    year = {2011},
}