this is a python toolkit for mining chinese text (now) and other data (in future)
- extract keywords based on TFIDF, TextRank(based on https://github.com/fxsjy/jieba)
- extract key phrases based on some code from https://github.com/letiantian/TextRank4ZH/tree/master/textrank4zh
- generate word cloud based on WordCloud (https://github.com/amueller/word_cloud)
- generate word frequencies and co-occurrence network (from https://github.com/ipython/talks/blob/master/parallel/text_analysis.py)
- create word2vec model based on gensim
- generate dendrogram of keywords based on word vectors
- cluster keywords based on kmeans
##License All materials in this repository are licensed CC-BY, and I encourage reuse!