Skip to content

huanyannizu/C-Value-Term-Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C-Value-Term-Extraction

This repository contains implementations of the term extraction algorithm (without filters) from

Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method Katerina Frantziy, Sophia Ananiadouy, Hideki Mima

The sample algorithm testing file 'Turku.txt' was tagged by Stanford CoreNLP to Part of Speech, which gave out 'Turku-tagged.txt'

To extract terms from a tagged file, run the following command in the terminal:

python3 Main.py path_to_/Turku-tagged.txt ligui_filter max_len freq_threshold C_Value_threshld

Parameters in the above command:

  • ligui_filter: the linguistic filter, can be Noun or AdjNoun or AdjPrepNoun
  • max_len: the expected maximum length of a term
  • freq_threshold: the frequency threshold
  • C_Value_threshld: the C-value threshold

The program will print out terms with the top-10 C-value.

Example of running using Noun filter

screen shot 2017-12-24 at 6 54 32 pm

Example of running using AdjNoun filter

screen shot 2017-12-24 at 6 54 51 pm

Example of running using AdjPrepNoun filter

screen shot 2017-12-24 at 7 22 29 pm

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages