This repository contains implementations of the term extraction algorithm (without filters) from
Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method Katerina Frantziy, Sophia Ananiadouy, Hideki Mima
The sample algorithm testing file 'Turku.txt' was tagged by Stanford CoreNLP to Part of Speech, which gave out 'Turku-tagged.txt'
python3 Main.py path_to_/Turku-tagged.txt ligui_filter max_len freq_threshold C_Value_threshld
- ligui_filter: the linguistic filter, can be Noun or AdjNoun or AdjPrepNoun
- max_len: the expected maximum length of a term
- freq_threshold: the frequency threshold
- C_Value_threshld: the C-value threshold
The program will print out terms with the top-10 C-value.