- TermEval 2020: a platform for researchers to work on ATE.
- ATE: the automated process of identifying terminology from a corpus of specialised texts.
- Terms: lexical items that represent concepts of a domain.
-
-
Descriptions:
- Domains: Corruption, dressage, wind energy (train), heart failure (test).
- Languages: : English, French and Dutch.
- ~50k tokens/language/domain manually annotated
- Unstructured lists of all unique annotated terms.
-
Labels: term or not (binary task)
- Named Entities (optional)
- True terms
Specific Terms Common Terms Out-Of-Domain Terms - Domain-specific
- Lexical-specific
- x
- x
- x
- o
- o
- x
-
2 datasets: with and without Named Entities
-
-
- Precision: how many of the extracted terms are correct.
- Recall: how many of the terms in the text have correctly been extracted.
- F1-score: harmonic mean (gold standard with only terms and with both terms and Named Entities).
-
-
NYU: Termolator on English version.
- Select candidate terms based on chunking and abbreviations.
- Calculate distribution metrics, well-formedness, relevance score.
-
RACAI: Combine several statistical approachs and vote to generate results on English version only.
- TextRank, TFIDF, clustering, termhood features.
-
e-Terminology:
- TSR (Token Slot Recognition) technique in TBXTools.
- Dutch: statistical version
- Enlish, French: linguistic version
- Filter out stopwords and f(terms) <= 2.
- Terminological reference: IATE database for 12-Law.
- TSR (Token Slot Recognition) technique in TBXTools.
-
MLPLab_UQAM: Bidirectional LSTM with GloVe embeddings on 3 languages.
-
TALN-LS2N: only English, French (described in next paper).
-
-
- TALN-LS2N’s system outperforms all others in the English and French tracks.
- NLPLab UQAM’s system outperforms e-Terminology for the Dutch track.
- Unpredictability of DL models (BERT)
- Large gap between precision and recall for English model, much smaller for French model.
- ACTER v1.3
- Data description in README.html.
-
- Training phase: Corruption, dressage, wind energy.
- Test phase: Heart failure.
- Languages: : English, French and Dutch.
Feature-based approaches | Context-based approaches |
|
|
-
- BERT outperforms classical methods
- New, simple and strong baseline for terminology extraction