UyghurTextResource

uyghur text resources crawled from website, every root folder name represent the crawled website domain and each root folder contains three sub folder and one txt file, details as follow:

###data folder:

original text content crawled from web page(warning: this is raw text from web site)

###content folder:

original uyghur text from the web page(a line text that split by space)

###dic folder:

original web page words list handled by word tokenization

###unique.txt file:

unique word list crawled from the entire website

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

UyghurTextResource

Files

README.md

Latest commit

History

README.md

File metadata and controls

UyghurTextResource