Text classification

Introduction

Text classification is an interesting topic in the NLP field. In this repository, you will find an overview of different algorithms to use for this purpose: SVM, LSTM and RoBERTa.

The details of results are in the notebooks:

	LSTM	SVM	RoBERTa
Accuracy*	96.26%	96.86%	98.50%

*These are the top accuracy values that I achieved in my runnings.

Notebooks

I have split the repository in several notebooks:

Important: The LSTM model uses pre-trained vectors from the Glove project. If you want to use that model, first you must download the set GloVe 6B and place the file glove.6B.100d.txt in the path ./data/glove.6B/glove.6B.100d.txt. See https://nlp.stanford.edu/projects/glove/

Dataset

I have used a dataset which consists of 2225 documents from the BBC news website corresponding to stories in five topical areas from 2004-2005. These documents are classified in 5 different categories: business, entertainment, politics, sport, tech.

Source: http://mlg.ucd.ie/datasets/bbc.html - D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006.

Commands

I tested these commands in MacOS (any Unix platform is good) and Windows (in Git Bash terminal).

I have implemented the necesary code to train an SVM model and make predictions. All the code run in docker containers, so you only must install Docker in your computer.

You can control the docker containers with these two commands:

sh manager.sh docker:run
sh manager.sh docker:down

Now, you have two commands that you can use to train a model and make predictions:

sh manager.sh train
sh manager.sh predict "Write your text here..."

For example, let's make a prediction:

$ sh manager.sh predict "A text about tennis"
INFO:root:Applying char cleaner...
INFO:root:Applying lemmatization...
INFO:root:Loading model "svm"...
sport

And one additional command to enter (if you need it) to the Python container:

sh manager.sh python

Have fun! ᕙ (° ~ ° ~)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
code		code
data		data
misc		misc
notebooks		notebooks
.env		.env
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
manager.sh		manager.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text classification

Introduction

Notebooks

Dataset

Commands

About

Releases

Packages

Languages

DimasDMM/text-classification

Folders and files

Latest commit

History

Repository files navigation

Text classification

Introduction

Notebooks

Dataset

Commands

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages