Skip to content

ucfnlp/meeting-domain-terminology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

How Domain Terminology Affects Meeting Summarization Performance

We provide annotated domain terminology for the COLING 2020 paper "How Domain Terminology Affects Meeting Summarization Performance". If you find the information useful, please cite the following:

@inproceedings{koay-etal-2020-domain,
    title = {How Domain Terminology Affects Meeting Summarization Performance},
    author = {Jia Jin Koay and Alexander Roustai and Xiaojin Dai and Dillon Burns and Alec Kerrigan and Fei Liu},
    booktitle = {Proceedings of the 28th International Conference on Computational Linguistics},
    year = {2020}}

Domain Terminology

We solicit annotations from undergraduate students majoring in computer science and designate words and expressions that are beyond the scope of their knowledge as domain terminology. The annotators are able to annotate all of the 75 ICSI meetings for jargon terms. The unique jargon terms for each file are provided. Below is a snippet of the human transcript with domain terminology shown in bold. The original meeting transcripts can be downloaded here.

Start End Spoken Utterance
247.255 252.672 with Andreas' help um Andreas put together a sort of no frills recognizer which is uh
252.672 258.837 gender-dependent but like no adaptation, no cross-word models, no trigrams - a bigram recognizer
258.837 262.221 and that's trained on Switchboard which is telephone conversations.
263.983 267.154 and thanks to Don's help wh- who - Don took
267.154 270.431 the first meeting that Jane had transcribed
270.431 277.520 and um you know separated - used the individual channels we segmented it in- into the segments that Jane had used
277.520 279.952 and uh Don sampled that so -
281.374 289.611 um and then we ran up to I guess the first twenty minutes, up to synch time of one two zero zero so is that - that's twenty minutes or so?
289.611 296.601 Um yeah because I guess there's some, and Don can talk to Jane about this, there's some bug in the actual synch time file that

Errata

The ROUGE scores reported in this paper were obtained by running pyrouge against a single human reference summary using ROUGE options -c 95 -n 2 -a -s -m -2 4 -u, where -s indicates stopwords are removed from system and reference summaries. It was later found that the scores for the ASR summaries were obtained by running pyrouge against three human reference summaries. We report the corrected ASR results run on the single human reference summary (with stopwords removed) below.

Original ASR results in COLING paper (3 human references, stopwords removed):

R-1 P R-1 R R-1 F R-2 P R-2 R R-2 F R-SU4 P R-SU4 R R-SU4 F
ASR Ours (w/o Jargon) 41.7 55.1 46.8 15.1 20.5 17.2 18.7 25.3 21.3
Ours (w/ Jargon) 39.7 57.5 46.6 15.1 21.9 17.7 18.2 26.5 21.4

Updated ASR results (single human reference, stopwords removed):

R-1 P R-1 R R-1 F R-2 P R-2 R R-2 F R-SU4 P R-SU4 R R-SU4 F
ASR Ours (w/o Jargon) 51.0 54.0 50.4 18.9 20.2 18.9 23.4 25.0 23.3
Ours (w/ Jargon) 49.7 56.8 50.8 19.9 22.5 20.2 23.2 26.8 23.8

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published