Machine Learning - News Articles classification with sklearn

Classify news articles into different categories using Machine Learning. The dataset consists of 6000 documents and 47 categories.

My goal is to show you how to create a predictive model(s) that will classification labels for news articles.

Objective

To classify news articles
Learn the basics of natural language processing
Build models using sklearn and choose the best one
Use sklearn's make_pipeline class
Learn how to turn it into a service
Learn how to make it composable and portable
...
Profit?

Prerequisite

Python >= v3.11
Jupyter Notebook
Some knowledge of Machine Learning

Python Libs

NumPy
Pandas
SciPy
Matplotlib
Jupyter
Scikit-learn (the library that we will use later in this post when creating the classifier model(s))

We Will

Apply some preprocessing steps to prepare the data.
We will perform a descriptive analysis of the data to better understand the main characteristics that they have
We will continue by practicing how to train different machine learning models using scikit-learn. It is one of the most popular python libraries for machine learning
We will also use a subset of the dataset for training purposes
We will iterate and evaluate the learned models by using unseen data. Later, we will compare them until we find a good models that meets our expectations, and use a VotingClassifier soft voting for unfitted estimators.
Once we have chosen the candidate model(s), we will use it to perform predictions and to create a simple web application that consumes this predictive model

Getting started with the machine learning tutorial

See Jupyter Notebook

Deployment

As a container:

docker run -d -p 7070:7070 docker.io/saidsef/ml-classifier:latest

As a Python application:

pip3 install -r requirements.txt

PORT=7070 classifier-ml.py

JSON Format

Payload format should be JSON format

{ "body": "text-goes-here" }

The Request

The quest must be POST and JSON format:

curl -XPOST http://localhost:7070/api/v1/news -H 'Content-Type: application/json' -d @test/test.json

Response will be json format:

{
  "score": 1,
  "category": "Opinion"
}

Kubernetes

kubectl apply -k ./deployment

Name		Name	Last commit message	Last commit date
Latest commit History 398 Commits
.devcontainer		.devcontainer
.github		.github
data		data
deployment		deployment
scripts		scripts
test		test
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
classifier-ml.ipynb		classifier-ml.ipynb
classifier-ml.py		classifier-ml.py
classifier.py		classifier.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning - News Articles classification with sklearn

Objective

Prerequisite

Python Libs

We Will

Getting started with the machine learning tutorial

Deployment

JSON Format

The Request

Kubernetes

About

Releases 30

Sponsor this project

Packages

Contributors 3

Languages

License

saidsef/ml-classifier

Folders and files

Latest commit

History

Repository files navigation

Machine Learning - News Articles classification with sklearn

Objective

Prerequisite

Python Libs

We Will

Getting started with the machine learning tutorial

Deployment

JSON Format

The Request

Kubernetes

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 30

Sponsor this project

Packages 0

Contributors 3

Languages

Packages