by

We use the BiLSTM attention Kfold add features kernel to reach 0.703 score at the Kaggle Quora competition.
This kernel stands on :

gru-capsule
How to: Preprocessing when using embedding
Improve your Score with some Text Preprocessing
Simple attention layer
baseline-pytorch-bilstm
pytorch-starter

Basic Parameters

name	value
embed_size	300
max_features	120000
maxlen	70
batch_size	512
n_epochs	5
n_splits	5

Ensure determinism in the results

seed_everything : A common headache in this competition is the lack of determinism in the results due to cudnn. This Kerne has a solution in Pytorch.

Code for Loading Embeddings

Function from here.

load_glove
load_fasttext
load_para

Load processed training data from disk

build_vocab

Normalization

Borrowed from:

How to: Preprocessing when using embeddings
Improve your Score with some Text Preprocessing
build_vocab
known_contractions
clean_contractions
correct_spelling
unknown_punct
clean_numbers
clean_special_chars
add_lower
clean_text
clean_numbers
_get_mispell
replace_typical_misspell

Extra feature part taken here

add_features_before_cleaning
- count_contains_a_punct
- count_contains_a_string
- count_words_more_frequent_in_insc
- count_words_more_frequent_in_sc
add_features_custom
- count_contains_a_string
- count_words_more_frequent_in_insc
- count_words_more_frequent_in_sc

add_features

load_and_prec - lower - Clean the text - Clean numbers - Clean speelings - fill up the missing values

Add Features

https://github.com/wongchunghang/toxic-comment-challenge-lstm/blob/master/toxic_comment_9872_model.ipynb
Tokenize the sentences
Pad the sentences
Get the target values
Splitting to training and a final test set
shuffling the data
fill up the missing values

Save data on disk

Load dataset from disk

Load Embeddings

Two embedding matrices have been used. Glove, and paragram. The mean of the two is used as the final embedding matrix. Missing entries in the embedding are set using np.random.normal so we have to seed here too

Use Stratified K Fold to improve results

Cyclic CLR

Code taken here code inspired from: https://github.com/anandsaha/pytorch.cyclic.learning.rate/blob/master/cls.py

CyclicLR
- batch_step
- _triangular_scale_fn
- _triangular2_scale_fn
- _exp_range_scale_fn
- get_lr

Model Architecture

Binary LSTM with an attention layer and an additional fully connected layer. Also added extra features taken from a winning kernel of the toxic comments competition. Also using CLR and a capsule Layer. Blended together in concatentation.

Initial idea borrowed from: https://www.kaggle.com/ziliwang/baseline-pytorch-bilstm

Embed_Layer
- forward
GRU_Layer
- init_weights
- forward
Caps_Layer
- forward
- squash
Capsule_Main
- forward
Attention
- forward
NeuralNet
- forward

Training

The method for training is borrowed from https://www.kaggle.com/hengzheng/pytorch-starter

MyDataset -__getitem__
- __len__
sigmoid

Find final Thresshold

Borrowed from: https://www.kaggle.com/ziliwang/baseline-pytorch-bilstm

bestThresshold

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Basic Parameters

Ensure determinism in the results

Code for Loading Embeddings

Load processed training data from disk

Normalization

Save data on disk

Load dataset from disk

Load Embeddings

Use Stratified K Fold to improve results

Cyclic CLR

Model Architecture

Training

Find final Thresshold

submission

Files

README.md

Latest commit

History

README.md

File metadata and controls

Basic Parameters

Ensure determinism in the results

Code for Loading Embeddings

Load processed training data from disk

Normalization

Save data on disk

Load dataset from disk

Load Embeddings

Use Stratified K Fold to improve results

Cyclic CLR

Model Architecture

Training

Find final Thresshold

submission