Sentiment analysis is a classification task where each sample is assigned a positive or negative label.
This repo contains the code for the this blog.
- Preprocessing and tokenization
- Generating vocabulary of unique tokens and converting words to indices
- Loading pretrained vectors e.g. Glove, Word2vec, Fasttext
- Padding text with zeros in case of variable lengths
- Dataloading and batching
- Model creation and training
Torchtext provide set of classes that are useful in NLP tasks. These classes takes care of first 5 points above with very minimal code.
- Python 3.6
- Pytorch 0.4
- TorchText 0.2.3
- Understanding of GRU/LSTM [1]
What is covered in the notebook
- Train validation split
- Define how to process data
- Create torchtext dataset
- Load pretrained word vectors and building vocabulary
- Loading the data in batches
- Simple GRU model
- GRU model with concat pooling
- Training
[1] https://colah.github.io/posts/2015-08-Understanding-LSTMs/
[2] https://arxiv.org/abs/1801.06146