To build a website classification model, we used the DMOZ dataset which is a large communally maintained open directory that categorizes web content. DMOZ closed in 2017 because AOL no longer wished to support the project. We found the split version of the dataset in a Github repository.
- Text Preprocessing with spaCy
- Logistic Regression Model
- Decision Tree Model
- Multinomial Naive Bayes Model
- Text Preprocessing with NLTK
- Cross-Validation
- Multinomial Naive Bayes Classifier Model
- Logistic Regression Model
- Decision Tree Classifier Model
- Random Forest Classifier Model