Classify news articles into different categories using Machine Learning. The dataset consists of 6000 documents and 47 categories.
My goal is to show you how to create a predictive model(s) that will classification labels for news articles.
- To classify news articles
- Learn the basics of natural language processing
- Build models using sklearn and choose the best one
- Use sklearn's make_pipeline class
- Learn how to turn it into a service
- Learn how to make it composable and portable
- ...
- Profit?
- Python >= v3.11
- Jupyter Notebook
- Some knowledge of Machine Learning
- NumPy
- Pandas
- SciPy
- Matplotlib
- Jupyter
- Scikit-learn (the library that we will use later in this post when creating the classifier model(s))
- Apply some preprocessing steps to prepare the data.
- We will perform a descriptive analysis of the data to better understand the main characteristics that they have
- We will continue by practicing how to train different machine learning models using scikit-learn. It is one of the most popular python libraries for machine learning
- We will also use a subset of the dataset for training purposes
- We will iterate and evaluate the learned models by using unseen data. Later, we will compare them until we find a good models that meets our expectations, and use a
VotingClassifier
soft voting for unfitted estimators. - Once we have chosen the candidate model(s), we will use it to perform predictions and to create a simple web application that consumes this predictive model
See Jupyter Notebook
As a container:
docker run -d -p 7070:7070 docker.io/saidsef/ml-classifier:latest
As a Python application:
pip3 install -r requirements.txt
PORT=7070 classifier-ml.py
Payload format should be JSON format
{ "body": "text-goes-here" }
The quest must be POST
and JSON
format:
curl -XPOST http://localhost:7070/api/v1/news -H 'Content-Type: application/json' -d @test/test.json
Response will be json
format:
{
"score": 1,
"category": "Opinion"
}
kubectl apply -k ./deployment