uyghur text resources crawled from website, every root folder name represent the crawled website domain and each root folder contains three sub folder and one txt file, details as follow:
###data folder:
original text content crawled from web page(warning: this is raw text from web site)
###content folder:
original uyghur text from the web page(a line text that split by space)
###dic folder:
original web page words list handled by word tokenization
###unique.txt file:
unique word list crawled from the entire website