Text Classification Case Study

This project aims to explore and implement text classification techniques on a real-world dataset. Text classification involves assigning predefined categories or labels to a given set of textual data. In this case study, we focus on applying text classification to a specific dataset to gain insights and build a predictive model.

How to see the notebook

The notebook contains plotly charts which are not rendered in GitHub interface due to interactivity. While you can get an understanding of the case without them, the data visualisation is a big part of the study.

Please use nbviewer to see the fully rendered notebook - link.

Dataset

For this case study we use Zenodo E commerce text dataset.

Here's a description provided by Zenodo:

This is the classification based E-commerce text dataset for 4 categories - "Electronics", "Household", "Books" and "Clothing & Accessories", which almost cover 80% of any E-commerce website.

The dataset is in ".csv" format with two columns - the first column is the class name and the second one is the datapoint of that class. The data point is the product and description from the e-commerce website.

Task setting

Imagine we own a E-commerce website. On this website the seller can upload the descriptions of the items they want to sell. They also have to choose items' categories manually which may slow the seller down.

Our task is to automate the choice of categories based on item description.

However, the wrong automated choice may lead to losses in sales, therefore we may choose not to set an automated label if we're not sure.

Techniques used and compared

In this case study we implement and compare 4 text classification techniques:

Baseline: Bag of Words + Logistic regression
GRU
LSTM using pretrained embedding layer
Fine-tuned BERT

We compare them by:

classification quality
inference time keeping in mind that we would want to use the model in production environment in our imaginary task
the percentage of precise auto verdicts - i.e. how often the seller would have to interfere if the model is not sure.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
evaluation		evaluation
modelling		modelling
LICENSE		LICENSE
README.md		README.md
item_description_classifier.ipynb		item_description_classifier.ipynb
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classification Case Study

How to see the notebook

Dataset

Task setting

Techniques used and compared

About

Releases

Packages

Languages

License

mskozlova/text_classification_case_study

Folders and files

Latest commit

History

Repository files navigation

Text Classification Case Study

How to see the notebook

Dataset

Task setting

Techniques used and compared

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages