Skip to content

Latest commit

 

History

History
29 lines (23 loc) · 2.03 KB

README.md

File metadata and controls

29 lines (23 loc) · 2.03 KB

Topic Classifier

Description

In this project a topic classification model was created. More specifically an LSTM network was trained to distinguish between 5 different categories of articles (business, entertainment, politics, sport, tech). The dataset used, for training the network, was the BBC articles dataset, which consists of 2225 documents, from the BBC news website corresponding to stories from 2004-2005. The model is deployed on Kubernetes on GKE and can used at datagusto.com.

Train the Model

#!/bin/bash
pip install -r requirements.txt
python -m nltk.downloader stopwords punkt wordnet
python ./src/train.py --num_epochs={{ num_epochs }}
                      --top_words={{ top_words }}
                      --max_sequence_length={{ max_sequence_length }}
                      --batch_size={{ batch_size }}
                      --polyaxon_env={{ polyaxon_env }}
Parameter Description Valid Values Default
num_epochs Number of epochs to be used for training int 40
batch_size Batch size to be used for training int 256
top_words Number of most common words to be used for training (rare words will be dropped) int 35000
max_sequence_length Fixed length of each input text. The text will be padded or trimmed down to that length int 500
polyaxon_env Indicate if running in Polyaxon 0, 1 0

Any feedback is welcome! :) LinkedIn