An experimental webpage for observing Chinese natural language processing. It demonstrates the processes of decomposition, transformation, deletion, and model building. Written in JavaScript, it can be executed in any browser. The webpage can be integrated with other tools to create word clouds, perform text statistical analysis, conduct LDA topic modeling, and utilize machine learning classifiers.
觀察中文自然語言處理過程的實驗網頁。可瞭解分解、轉換、刪除、建立模型的過程。用JavaScript撰寫,只要有瀏覽器即可執行。可搭配其他工具來繪製文字雲、文本統計分析、LDA主題塑模、以及機器學習分類器。
- JavaScript
- Vue: MVVM框架
- Jieba-JS: 斷詞演算法
- Semantic UI: CSS框架
https://pulipulichen.github.io/jieba-js/
- 線上中文斷詞工具:Jieba-JS / Online Chinese Analyzer: Jieba-JS
- 發掘文件中的主題:Weka分群應用於文本探勘 / Discover the Topic of Text Collection: Text Mining based on Weks's Clustering
Yung-Ting Chen, & Herman Schaaf. (2024). pulipulichen/jieba-js: 20240518.161411 (20240518.161411) [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.11213326
- 資料檔案 https://docs.google.com/spreadsheets/d/1mhUzD6xEpQG3wvfuF0ofoQhTajCPlpdJGMDNuCVzH2Y/edit?usp=sharing
- Spreadsheet to ARFF線上轉換: http://pulipulichen.github.io/jieba-js/weka/spreadsheet2arff/index.html
- ARFF result to CSV線上轉換: http://pulipulichen.github.io/jieba-js/weka/arff2csv/index.html
- http://pulipulichen.github.io/jieba-js/weka/simple-kmeans
- http://pulipulichen.github.io/jieba-js/weka/simple-kmeans/cascadeKMeans1.0.4.zip
-
Chinese stopwords https://github.com/stopwords-iso/stopwords-zh
-
English stopwords https://github.com/stopwords-iso/stopwords-en