Skip to content

Website classification using text processing techniques on DMOZ dataset

Notifications You must be signed in to change notification settings

crncck/WebsiteClassification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

WebsiteClassification

To build a website classification model, we used the DMOZ dataset which is a large communally maintained open directory that categorizes web content. DMOZ closed in 2017 because AOL no longer wished to support the project. We found the split version of the dataset in a Github repository.

Dataset


Ayşe Ceren Çiçek

  • Text Preprocessing with spaCy
  • Logistic Regression Model
  • Decision Tree Model
  • Multinomial Naive Bayes Model

Gizem Kurnaz

  • Text Preprocessing with NLTK
  • Cross-Validation
  • Multinomial Naive Bayes Classifier Model
  • Logistic Regression Model
  • Decision Tree Classifier Model
  • Random Forest Classifier Model

About

Website classification using text processing techniques on DMOZ dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published