NLP Quick Start

See Notebook for Code and Walkthrough

There are a wealth of fancy NLP algorithms available today - particularly using transformers, which has overshadowed a lot of the basics of NLP such as clustering and classification. However, simple algorithms are much easier to scale and often provide an excellent basis before building more complicated models.

I'm going to work through the Twitter Disasters dataset originally made available by Crowdflower. I found it currently available here.

Abstract

In this notebook, I use a simple logistic regression classifier on a dataset of 10,000 tweets to predict whether the tweet refers to a true "disaster" event, or whether the tweet is irrelevant. I focus on the interpretability of simple classification models and what that means for text data. I look at a few methods of creating text embeddings for NLP tasks and explain the use and demonstrate the significant advantages of incorporating semantic meaning into NLP tasks using models such as Word2Vec.

I extend on the interpretability of NLP models by using LIME to understand predictions made on Word2Vec embeddings, and finally, I attempt to incorporate the syntactic structure of tweets into a model's predictions by building a 1D CNN for text classification on top of Word2Vec embeddings.

The point of this notebook is to serve as an intro to NLP to get direction for where and how to proceed in improving the performance of text clustering and classification algorithms, whether that entails further dataset processing - a commonly productive endeavour - or in employing a more complicated model.

The steps I've taken here constitute a good baseline to get started on an NLP project, though by no means are they comprehensive.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
img		img
LICENSE		LICENSE
NLP_keras_nltk_lime.ipynb		NLP_keras_nltk_lime.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Quick Start

See Notebook for Code and Walkthrough

Abstract

Table of Contents:

Cite

Licence

About

Releases

Packages

Languages

License

IliaZenkov/NLP-keras-nltk-lime

Folders and files

Latest commit

History

Repository files navigation

NLP Quick Start

See Notebook for Code and Walkthrough

Abstract

Table of Contents:

Cite

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages