Skip to content

Latest commit

 

History

History
25 lines (16 loc) · 787 Bytes

README.md

File metadata and controls

25 lines (16 loc) · 787 Bytes

WebsiteClassification

To build a website classification model, we used the DMOZ dataset which is a large communally maintained open directory that categorizes web content. DMOZ closed in 2017 because AOL no longer wished to support the project. We found the split version of the dataset in a Github repository.

Dataset


Ayşe Ceren Çiçek

  • Text Preprocessing with spaCy
  • Logistic Regression Model
  • Decision Tree Model
  • Multinomial Naive Bayes Model

Gizem Kurnaz

  • Text Preprocessing with NLTK
  • Cross-Validation
  • Multinomial Naive Bayes Classifier Model
  • Logistic Regression Model
  • Decision Tree Classifier Model
  • Random Forest Classifier Model