Discovering Topic Representative Terms for Short Text Clustering (TRTD)

Source code for method TRTD: "Discovering Topic Representative Terms for Short Text Clustering"

How to run

python kwg_discovery_clustering.py -d dataset/Tweet_merged_50 --gamma 30 --theta 0.8 --delta 0.1

Requirement

Python 3.x

short text dataset

The Tweet dataset contains 167, 136 tweets for 164 cluters and each tweet averagely comprises 7.54 words.

Results on Tweet dataset

2021-04-03 22:52:01,697 - kwg_discovery_clustering.py - INFO - open dataset: dataset/Tweet_merged_50
2021-04-03 22:52:06,564 - kwg_discovery_clustering.py - INFO - parameters: gamma:30, delta:0.1,theta:0.8
2021-04-03 22:52:06,564 - kwg_discovery_clustering.py - INFO - contruct word graph
2021-04-03 22:52:07,471 - kwg_discovery_clustering.py - INFO - node length:3946, edge length:12980
2021-04-03 22:52:07,476 - kwg_discovery_clustering.py - INFO - ['iphone', 'new', 'app']
2021-04-03 22:52:07,476 - kwg_discovery_clustering.py - INFO - ['flu', 'swine', 'cases']
...
...
... 

2021-04-03 22:52:32,572 - kwg_discovery_clustering.py - INFO - ------------------------clustering result-----------------------------
2021-04-03 22:52:32,572 - kwg_discovery_clustering.py - INFO - original dataset length:167136,pred dataset length:167136
2021-04-03 22:52:32,577 - kwg_discovery_clustering.py - INFO - number of clusters in dataset: 164
2021-04-03 22:52:32,578 - kwg_discovery_clustering.py - INFO - number of clusters estimated: 200
2021-04-03 22:52:32,746 - kwg_discovery_clustering.py - INFO - Homogeneity: 0.846
2021-04-03 22:52:32,874 - kwg_discovery_clustering.py - INFO - Completeness: 0.775
2021-04-03 22:52:32,992 - kwg_discovery_clustering.py - INFO - V-measure: 0.809
2021-04-03 22:52:33,102 - kwg_discovery_clustering.py - INFO - Adjusted Rand Index: 0.842
2021-04-03 22:52:33,925 - kwg_discovery_clustering.py - INFO - Adjusted Mutual Information: 0.771
2021-04-03 22:52:34,074 - kwg_discovery_clustering.py - INFO - Normalized Mutual Information: 0.810
2021-04-03 22:52:34,145 - kwg_discovery_clustering.py - INFO - Purity Score: 0.932

Please cite

@article{yang2019discovering,
 title={Discovering topic representative terms for short text clustering},
 author={Yang, Shuiqiao and Huang, Guangyan and Cai, Borui},
 journal={IEEE Access},
 volume={7},
 pages={92037--92047},
 year={2019},
 publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset		dataset
README.md		README.md
kwg_discovery_clustering.py		kwg_discovery_clustering.py
log.py		log.py
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discovering Topic Representative Terms for Short Text Clustering (TRTD)

How to run

Requirement

short text dataset

Results on Tweet dataset

Please cite

About

Releases

Packages

Contributors 2

Languages

shuiqiaoyang/Discovering-Topic-Representative-Terms-for-Short-Text-Clustering

Folders and files

Latest commit

History

Repository files navigation

Discovering Topic Representative Terms for Short Text Clustering (TRTD)

How to run

Requirement

short text dataset

Results on Tweet dataset

Please cite

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages