-
Notifications
You must be signed in to change notification settings - Fork 67
Reproducing Evaluation Results
- Introduction
- Preparation of folder structure
- Preparation of corpora
- Create UIMA workflows
- Run UIMA workflows
- Compare results
On this page, you will find a description how to reproduce HeidelTime's evaluation results reported in our papers listed on project home.
To reproduce these evaluation results, download the UIMA HeidelTime kit archives (link), and all other components needed to run HeidelTime in a UIMA pipeline (see the documentation of HeidelTime in the UIMA HeidelTime kit). Then proceed as follows.
To have the evaluation run in an organized fashion, we'll first create a directory structure and then fill it with all of the relevant data.
- Create the following directory structure (the top folder will be referred to as
EVALPATH
): *EVALPATH/
*EVALPATH/corpora/
*EVALPATH/evaluation_results/
*EVALPATH/uima_output/
*EVALPATH/uima_workflows/
- Download and extract temporal annotated corpora to
EVALPATH/corpora/
: * ACE Tern 2004 training data is released by the Linguistic Data Consortium (catalogue number LDC2005T07) * ACE Tern 2005 training data is released by the Linguistic Data Consortium (catalogue number LDC2006T06) * Arabic corpora are an annotated subset of the ACE Tern 2005 training data. It is distributed by us through our institute's research page * TimeBank 1.2 is realeased by the Linguistic Data Consoritum (catalogue number LDC2006T08) * TempEval-2 training and evaluation sets are released here. Chinese annotations for our improved and clean versions of TE2 are distributed in our scripts package. * WikiWars is published here * WikiWarsDE (see our download page) * WikiWarsVN (see our download page) * Time4SMS (see our download page) * Time4SCI (see our download page) * I-CAB (see http://ontotext.fbk.eu/i-cab/download-icab.html) * TempEval-3 training (TB/AQ, trainT3) and test (platinum, spanish) corpora (see http://www.cs.york.ac.uk/semeval-2013/task1/) ATTENTION: Extract each of the TempEval-3 corpora so that the evaluation files are inEVALPATH/corpora/TempEval-3/<te3-platinum,TBAQ-cleaned,TE3-test-key,trainT3>/
. * ACE Tern 2004 evaluation data is released by the Linguistic Data Consortium (catalogue number LDC2010T18) * French TimeBank 1.1 (see https://gforge.inria.fr/projects/fr-timebank/) * AncientTimes (see our download page) * WikiWarsHR is publicly available from here * EVALITA 2014 Test corpus is available upon request from the EVENTI organizers. * TimeBankPT is publicly available from here - Download our preparation and evaluation scripts archive scripts.tar.gz/.zip (see our download page) and extract them to EVALPATH, so that these scripts will reside in
EVALPATH/scripts/
.
Prepare the corpora so that they can be read by the UIMA collection readers and evaluated using the official evaluation scripts:
Go to EVALPATH/scripts/
and run the prepare_corpus.sh
script for the corpus that shall be used for evaluation.
Usage:
bash prepare_corpus.sh <CORPUS_NAME> <EVALPATH>
Valid choices for CORPUS_NAME:
- Tern 2004 training data:
tern2004training
- Tern 2005 training data:
ace2005training
- TimeBank-1.2 corpus:
timebank12
- TimeBankPT corpus:
timebankpt
- TempEval-2 corpora:
tempeval2eval
,tempeval2train
,tempeval2eval-es
,tempeval2train-es
,tempeval2eval-it
,tempeval2train-it
,tempeval2eval-cn
,tempeval2train-cn
- WikiWars:
wikiwars
- TempEval-3:
te3platinum
- WikiWarsDE:
wikiwarsde
- WikiWarsVN:
wikiwarsvn-te3
- WikiWarsHR:
wikiwarshr
- Time4SMS:
time4sms
- Time4SCI:
time4sci
- I-CAB:
icab-training
,icab-test
- French TimeBank 1.1 corpus:
timebank-fr
- Arabic corpora:
ace2005trainingArabic
,arabic_test-50-star
Create UIMA workflows and save them in the EVALPATH/uima_workflows/
folder.
To create a UIMA workflow, use UIMA's collection processing engine (runCPE.sh
). Then load the descriptor files of the corresponding components and set the paramters as described below. The components are all part of the UIMA HeidelTime kit. In order for the TreeTaggerWrapper Annotator to run, you will need to set up TreeTagger on your system. This is detailed in the project's readme.txt file.
- Tern 2004 training data
save workflow as: EVALPATH/uima_workflows/tern2004training_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/tern-1.0/data/english/docs_separated/in/
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: english
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/tern2004training/keyinline
- ACE 2005 training data
save workflow as: EVALPATH/uima_workflows/ace2005training_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/ace_2005_td_v7/data/English/docs_separated/in/
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: english
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/ace2005training/keyinline
- AncientTimes Arabic
save workflow as: EVALPATH/uima_workflows/ancienttimes-ar_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/AncientTimesCorpus/arabic/untagged
* Annotate Creation Time: true
* Analysis Engine: Standford POS Tagger Wrapper
* Model_path: /path/to/stanfordtagger/models/arabic.tagger
* Config_path: empty
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_partofspeech: true
* Analysis Engine: HeidelTime
* Language: arabic
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/ancienttimes-ar
* Convert Timex 3 To 2: false
- AncientTimes German
save workflow as: EVALPATH/uima_workflows/ancienttimes-de_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/AncientTimesCorpus/german/untagged
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: german
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: german
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/ancienttimes-de
- AncientTimes English
save workflow as: EVALPATH/uima_workflows/ancienttimes-en_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/AncientTimesCorpus/english/untagged
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: english
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/ancienttimes-en
- AncientTimes Spanish
save workflow as: EVALPATH/uima_workflows/ancienttimes-es_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/AncientTimesCorpus/spanish/untagged
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: spanish
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: spanish
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/ancienttimes-es
- AncientTimes French
save workflow as: EVALPATH/uima_workflows/ancienttimes-fr_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/AncientTimesCorpus/french/untagged
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: french
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: french
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/ancienttimes-fr
- AncientTimes Italian
save workflow as: EVALPATH/uima_workflows/ancienttimes-it_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/AncientTimesCorpus/italian/untagged
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: italian
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: italian
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/ancienttimes-it
- AncientTimes Dutch
save workflow as: EVALPATH/uima_workflows/ancienttimes-nl_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/AncientTimesCorpus/dutch/untagged
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: dutch
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: dutch
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/ancienttimes-nl
- AncientTimes Vietnamese
save workflow as: EVALPATH/uima_workflows/ancienttimes-vn_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/AncientTimesCorpus/vietnamese/untagged
* Analysis Engine: JVnTextProWrapper
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_partofspeech: true
* Sent_model_path: JVNTEXTPRO_HOME/models/jvnsensegmenter
* Word_model_path: JVNTEXTPRO_HOME/models/jvnsegmenter
* Pos_model_path: JVNTEXTPRO_HOME/models/jvnpostag/maxent
* Analysis Engine: HeidelTime
* Language: vietnamese
* Date: true
* Duration: true
* Time: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/ancienttimes-vn
- Arabic Training 203
save workflow as: EVALPATH/uima_workflows/arabic_training-203_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/ace_2005_td_v7/data/Arabic/docs_separated/training-203/in
* Annotate Creation Time: true
* Analysis Engine: Standford POS Tagger Wrapper
* Model_path: /path/to/stanfordtagger/models/arabic.tagger
* Config_path: empty
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_partofspeech: true
* Analysis Engine: HeidelTime
* Language: arabic
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/arabic_training-203/keyinline
* Convert Timex 3 To 2: false
- Arabic Test 150
save workflow as: EVALPATH/uima_workflows/arabic_test-150_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/ace_2005_td_v7/data/Arabic/docs_separated/test-150/in
* Annotate Creation Time: true
* Analysis Engine: Standford POS Tagger Wrapper
* Model_path: /path/to/stanfordtagger/models/arabic.tagger
* Config_path: empty
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_partofspeech: true
* Analysis Engine: HeidelTime
* Language: arabic
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/arabic_test-150/keyinline
* Convert Timex 3 To 2: false
- Arabic Test 50
save workflow as: EVALPATH/uima_workflows/arabic_test-50_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/ace_2005_td_v7/data/Arabic/docs_separated/test-50/in
* Annotate Creation Time: true
* Analysis Engine: Standford POS Tagger Wrapper
* Model_path: /path/to/stanfordtagger/models/arabic.tagger
* Config_path: empty
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_partofspeech: true
* Analysis Engine: HeidelTime
* Language: arabic
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/arabic_test-50/keyinline
* Convert Timex 3 To 2: false
- Arabic Test 50 Star
save workflow as: EVALPATH/uima_workflows/arabic_test-50-star_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/ace_2005_td_v7/data/Arabic/test-50-star/docs_separated/in
* Annotate Creation Time: true
* Analysis Engine: Standford POS Tagger Wrapper
* Model_path: /path/to/stanfordtagger/models/arabic.tagger
* Config_path: empty
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_partofspeech: true
* Analysis Engine: HeidelTime
* Language: arabic
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/arabic_test-50-star/keyinline
* Convert Timex 3 To 2: false
- Arabic Test 50 Star for TE3-tools evaluation
save workflow as: EVALPATH/uima_workflows/arabic_test-50-star-te3_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/arabic_test-50-star-te3/
* Annotate Creation Time: true
* Analysis Engine: Standford POS Tagger Wrapper
* Model_path: /path/to/stanfordtagger/models/arabic.tagger
* Config_path: empty
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_partofspeech: true
* Analysis Engine: HeidelTime
* Language: arabic
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/arabic_test-50-star-te3/
* Convert Timex 3 To 2: false
- TimeBank-1.2
save workflow as: EVALPATH/uima_workflows/timebank12_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/timebank_1_2/data/timex2version/in
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: english
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/timebank12/keyinline
- TempEval-2
save workflow as: EVALPATH/uima_workflows/tempeval2eval_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/test/english/entities
* Charset: empty
* Use Spaces As Separators: true
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: english
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2eval/temp2files
- TempEval-2 Spanish
save workflow as: EVALPATH/uima_workflows/tempeval2eval-es_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/test/spanish/entities
* Charset: empty
* Use Spaces As Separators: true
* Analysis Engine: TreeTaggerWrapper
* Language: spanish
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: spanish
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2eval-es/temp2files
- TempEval-2 Italian
save workflow as: EVALPATH/uima_workflows/tempeval2eval-it_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/test/italian/entities
* Charset: empty
* Use Spaces As Separators: true
* Analysis Engine: TreeTaggerWrapper
* Language: italian
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: italian
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2eval-it/temp2files
- TempEval-2 Italian Training for TE3-tools evaluation
save workflow as: EVALPATH/uima_workflows/te2italian-train_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/training/italian/te3style/input
* Analysis Engine: TreeTaggerWrapper
* Language: italian
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: italian
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/te2italian-train
- TempEval-2 Italian Test for TE3-tools evaluation
save workflow as: EVALPATH/uima_workflows/te2italian-eval_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/test/italian/te3style/input
* Analysis Engine: TreeTaggerWrapper
* Language: italian
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: italian
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/te2italian-eval
- TempEval-2 Chinese Training
save workflow as: EVALPATH/uima_workflows/tempeval2train-cn_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/training/chinese/data/
* Charset: GBK
* Use Spaces As Separators: false
* Analysis Engine: TreeTaggerWrapper
* Language: chinese
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: chinese
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2train-cn/temp2files
- TempEval-2 Chinese Test
save workflow as: EVALPATH/uima_workflows/tempeval2eval-cn_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/test/chinese/entities/
* Charset: GBK
* Use Spaces As Separators: false
* Analysis Engine: TreeTaggerWrapper
* Language: chinese
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: chinese
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2eval-cn/temp2files
- TempEval-2 Chinese Training CLEAN
save workflow as: EVALPATH/uima_workflows/tempeval2train-cn-clean_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/training/chinese/data-clean/
* Charset: GBK
* Use Spaces As Separators: false
* Analysis Engine: TreeTaggerWrapper
* Language: chinese
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: chinese
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2train-cn-clean/temp2files
- TempEval-2 Chinese Test CLEAN
save workflow as: EVALPATH/uima_workflows/tempeval2eval-cn-clean_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/test/chinese/entities-clean/
* Charset: GBK
* Use Spaces As Separators: false
* Analysis Engine: TreeTaggerWrapper
* Language: chinese
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: chinese
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2eval-cn-clean/temp2files
- TempEval-2 Chinese Training IMPROVED
save workflow as: EVALPATH/uima_workflows/tempeval2train-cn-improved_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/training/chinese/data-improved/
* Charset: GBK
* Use Spaces As Separators: false
* Analysis Engine: TreeTaggerWrapper
* Language: chinese
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: chinese
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2train-cn-improved/temp2files
- TempEval-2 Chinese Test IMPROVED
save workflow as: EVALPATH/uima_workflows/tempeval2eval-cn-improved_workflow.xml
* Collection Reader: TempEval-2 Reader
* Input Directory: EVALPATH/corpora/tempeval2-data/test/chinese/entities-improved/
* Charset: GBK
* Use Spaces As Separators: false
* Analysis Engine: TreeTaggerWrapper
* Language: chinese
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: chinese
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-2 Writer
* Output Directory: EVALPATH/uima_output/tempeval2eval-cn-improved/temp2files
- WikiWars
save workflow as: EVALPATH/uima_workflows/wikiwars_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/WikiWars_20101004/in
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: english
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/wikiwars/keyinline
- WikiWarsDE
save workflow as: EVALPATH/uima_workflows/wikiwarsde_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/WikiWarsDE_20110412/in
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: german
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: german
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/wikiwarsde/keyinline
- WikiWarsVN (TempEval-3 style)
save workflow as: EVALPATH/uima_workflows/wikiwarsvn-te3_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/WikiWarsVN-TE3/untagged
* Analysis Engine: JVnTextProWrapper
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_partofspeech: true
* Sent_model_path: JVNTEXTPRO_HOME/models/jvnsensegmenter
* Word_model_path: JVNTEXTPRO_HOME/models/jvnsegmenter
* Pos_model_path: JVNTEXTPRO_HOME/models/jvnpostag/maxent
* Analysis Engine: HeidelTime
* Language: vietnamese
* Date: true
* Duration: true
* Time: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/wikiwarsvn-te3
- WikiWarsHR
save workflow as: EVALPATH/uima_workflows/wikiwarshr_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/TakeLab-WikiWarsHr/timeml
* Analysis Engine: HunPosTaggerWrapper
* Model_path: model.hunpos.mte5.defnpout
* Language: croatian
* Annotate_tokens: true
* Annotate_sentences: true
* Annotate_pos: true
* Analysis Engine: HeidelTime
* Language: croatian
* Date: true
* Duration: true
* Time: true
* Set: true
* Type: narratives
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/wikiwarshr
- Time4SMS
save workflow as: EVALPATH/uima_workflows/time4sms_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/Time4SMS/in
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: englishcoll
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: colloquial
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/time4sms/keyinline
- Time4SCI
save workflow as: EVALPATH/uima_workflows/time4sci_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/Time4SCI/in
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: englishsci
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: scientific
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/time4sci/keyinline
- I-CAB TIMEX 07
save workflow as: EVALPATH/uima_workflows/icab-test_workflow.xml
* Collection Reader: ACE Tern Reader
* Input Directory: EVALPATH/corpora/I-CAB_All/TIMEX-07/I-CAB-evalita07-TIMEX-test/docs_separated/in/
* Annotate Creation Time: true
* Analysis Engine: TreeTaggerWrapper
* Language: italian
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: italian
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: ACE Tern Writer
* Output Directory: EVALPATH/uima_output/icab-test/keyinline
- TempEval-3 Platinum
save workflow as: EVALPATH/uima_workflows/te3platinum_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/TempEval-3/te3-platinum/
* Analysis Engine: TreeTaggerWrapper
* Language: english
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: english
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/te3platinum
- TempEval-3 Spanish
save workflow as: EVALPATH/uima_workflows/te3spanish_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/TempEval-3/testES-TaskABC
* Analysis Engine: TreeTaggerWrapper
* Language: spanish
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: spanish
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/te3spanish
- French TimeBank 1.1
save workflow as: EVALPATH/uima_workflows/timebank-fr_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/FR-TimeBank1.1/Data
* Analysis Engine: TreeTaggerWrapper
* Language: french
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: french
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/timebank-fr
- TimeBankPT 1.0
save workflow as: EVALPATH/uima_workflows/timebankpt-test_workflow.xml
* Collection Reader: TempEval-3 Reader
* Input Directory: EVALPATH/corpora/TimeBankPT/test-te3style/
* Analysis Engine: TreeTaggerWrapper
* Language: portuguese
* Annotate_tokens: true
* Annotate_partofspeech: true
* Annotate_sentences: true
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: portuguese
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: true
* CAS Consumer: TempEval-3 Writer
* Output Directory: EVALPATH/uima_output/timebankpt-test
- EVALITA 2014 Test
save workflow as: EVALPATH/uima_workflows/evalita-taskA_workflow.xml
* Collection Reader: Eventi 2014 Reader
* Input Directory: EVALPATH/corpora/Gold_Standard-EVENTI-2014/Gold_Full_Main/Annotated_Data_Main
* Analysis Engine: TreeTaggerWrapper
* Language: italian
* Annotate_tokens: false
* Annotate_partofspeech: true
* Annotate_sentences: false
* Chinese Tokenizer Path: empty
* Analysis Engine: HeidelTime
* Language: italian
* Date: true
* Time: true
* Duration: true
* Set: true
* Type: news
* Convert Durations: false
* Analysis Engine: IntervalTagger
* Language: italian
* Annotate_intervals: true
* Annotate_interval_candidates: false
* CAS Consumer: Eventi 2014 Writer
* Output Directory: EVALPATH/uima_output/evalita-taskA
Go to EVALPATH/scripts/
and run the corresponding script depending on the corpus that shall be evaluated. The result file will be written to EVALPATH/evaluation_results/CORPUS/evaluation_results.txt
- ACE Tern 2004 training data
bash evaluate_corpus_ternstyle.sh tern2004training EVALPATH
- ACE 2005 training data
bash evaluate_corpus_ternstyle.sh ace2005training EVALPATH
- Arabic corpora:
bash evaluate_corpus_ternstyle.sh arabic_training-203 EVALPATH
bash evaluate_corpus_ternstyle.sh arabic_test-150 EVALPATH
bash evaluate_corpus_ternstyle.sh arabic_test-50 EVALPATH
bash evaluate_corpus_ternstyle.sh arabic_test-50-star EVALPATH
bash evaluate_corpus_tempeval3style.sh arabic_test-50-star-te3 EVALPATH
- TimeBank-1.2
bash evaluate_corpus_ternstyle.sh timebank12 EVALPATH
- TempEval-2
bash evaluate_corpus_tempeval2style.sh tempeval2eval EVALPATH
- TempEval-2 Spanish
bash evaluate_corpus_tempeval2style.sh tempeval2eval-es EVALPATH
- TempEval-2 Italian
bash evaluate_corpus_tempeval2style.sh tempeval2eval-it EVALPATH
- TempEval-2 Italian Training with TempEval-3 evaluation scripts
bash evaluate_corpus_tempeval3style.sh te2italian-train EVALPATH
- TempEval-2 Italian Eval with TempEval-3 evaluation scripts
bash evaluate_corpus_tempeval3style.sh te2italian-eval EVALPATH
- TempEval-2 Chinese
bash evaluate_corpus_tempeval2style.sh tempeval2train-cn EVALPATH
bash evaluate_corpus_tempeval2style.sh tempeval2eval-cn EVALPATH
bash evaluate_corpus_tempeval2style.sh tempeval2train-cn-clean EVALPATH
bash evaluate_corpus_tempeval2style.sh tempeval2eval-cn-clean EVALPATH
bash evaluate_corpus_tempeval2style.sh tempeval2train-cn-improved EVALPATH
bash evaluate_corpus_tempeval2style.sh tempeval2eval-cn-improved EVALPATH
- WikiWars
bash evaluate_corpus_ternstyle.sh wikiwars EVALPATH
- WikiWarsDE
bash evaluate_corpus_ternstyle.sh wikiwarsde EVALPATH
- WikiWarsVN
bash evaluate_corpus_tempeval3style.sh wikiwarsvn-te3 EVALPATH
- WikiWarsHR
bash evaluate_corpus_tempeval3style.sh wikiwarshr EVALPATH
- Time4SMS
bash evaluate_corpus_ternstyle.sh time4sms EVALPATH
- Time4SCI
bash evaluate_corpus_ternstyle.sh time4sci EVALPATH
- I-CAB
bash evaluate_corpus_ternstyle.sh icab-test EVALPATH
- TempEval-3 Platinum
bash evaluate_corpus_tempeval3style.sh te3platinum EVALPATH
- TempEval-3 Spanish
bash evaluate_corpus_tempeval3style.sh te3spanish EVALPATH
- French TimeBank 1.1
bash evaluate_corpus_tempeval3style.sh timebank-fr EVALPATH
- TimeBankPT 1.0
bash evaluate_corpus_tempeval3style.sh timebankpt-test EVALPATH
- EVALITA 2014 Test
bash evaluate_corpus_evalitastyle.sh evalita-taskA EVALPATH
- AncientTimes corpora
bash evaluate_corpus_tempeval3style.sh ancienttimes-vn EVALPATH
bash evaluate_corpus_tempeval3style.sh ancienttimes-nl EVALPATH
bash evaluate_corpus_tempeval3style.sh ancienttimes-fr EVALPATH
bash evaluate_corpus_tempeval3style.sh ancienttimes-it EVALPATH
bash evaluate_corpus_tempeval3style.sh ancienttimes-en EVALPATH
bash evaluate_corpus_tempeval3style.sh ancienttimes-de EVALPATH
bash evaluate_corpus_tempeval3style.sh ancienttimes-es EVALPATH
The evaluation results produced should be similar to the ones described in our papers "Multilingual Cross-domain Temporal Tagging" (pdf bibtex) and "Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards" (pdf bibtex), "Chinese Temporal Tagging with HeidelTime" (pdf bibtex), "Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese" (pdf bibtex) and "Extending HeidelTime for Temporal Expressions Referring to Historic Dates". Alternatively, you can always find the current evaluation results on this wiki page.
Please note that deviations from those numbers can arise from different tokenization/pos/sentence-taggers and experimental development versions of HeidelTime.