Skip to content

azmat21/UyghurTextResource

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

UyghurTextResource

uyghur text resources crawled from website, every root folder name represent the crawled website domain and each root folder contains three sub folder and one txt file, details as follow:

###data folder:

original text content crawled from web page(warning: this is raw text from web site)

###content folder:

original uyghur text from the web page(a line text that split by space)

###dic folder:

original web page words list handled by word tokenization

###unique.txt file:

unique word list crawled from the entire website

About

uyghur text resource crawled from website

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published