We use the BiLSTM attention Kfold add features
kernel to reach 0.703 score at the Kaggle Quora competition.
This kernel stands on :
- gru-capsule
- How to: Preprocessing when using embedding
- Improve your Score with some Text Preprocessing
- Simple attention layer
- baseline-pytorch-bilstm
- pytorch-starter
name | value |
---|---|
embed_size | 300 |
max_features | 120000 |
maxlen | 70 |
batch_size | 512 |
n_epochs | 5 |
n_splits | 5 |
seed_everything
: A common headache in this competition is the lack of determinism in the results due to cudnn. This Kerne has a solution in Pytorch.
Function from here.
load_glove
load_fasttext
load_para
build_vocab
Borrowed from:
-
Improve your Score with some Text Preprocessing
-
build_vocab
-
known_contractions
-
clean_contractions
-
correct_spelling
-
unknown_punct
-
clean_numbers
-
clean_special_chars
-
add_lower
-
clean_text
-
clean_numbers
-
_get_mispell
-
replace_typical_misspell
Extra feature part taken here
-
add_features_before_cleaning
count_contains_a_punct
count_contains_a_string
count_words_more_frequent_in_insc
count_words_more_frequent_in_sc
-
add_features_custom
count_contains_a_string
count_words_more_frequent_in_insc
count_words_more_frequent_in_sc
add_features
load_and_prec
- lower
- Clean the text
- Clean numbers
- Clean speelings
- fill up the missing values
Add Features
- https://github.com/wongchunghang/toxic-comment-challenge-lstm/blob/master/toxic_comment_9872_model.ipynb
- Tokenize the sentences
- Pad the sentences
- Get the target values
- Splitting to training and a final test set
- shuffling the data
- fill up the missing values
Two embedding matrices have been used. Glove, and paragram. The mean of the two is used as the final embedding matrix. Missing entries in the embedding are set using np.random.normal so we have to seed here too
Code taken here code inspired from: https://github.com/anandsaha/pytorch.cyclic.learning.rate/blob/master/cls.py
CyclicLR
batch_step
_triangular_scale_fn
_triangular2_scale_fn
_exp_range_scale_fn
get_lr
Binary LSTM with an attention layer and an additional fully connected layer. Also added extra features taken from a winning kernel of the toxic comments competition. Also using CLR and a capsule Layer. Blended together in concatentation.
Initial idea borrowed from: https://www.kaggle.com/ziliwang/baseline-pytorch-bilstm
-
Embed_Layer
forward
-
GRU_Layer
init_weights
forward
-
Caps_Layer
forward
squash
-
Capsule_Main
forward
-
Attention
forward
-
NeuralNet
forward
The method for training is borrowed from https://www.kaggle.com/hengzheng/pytorch-starter
-
MyDataset
-__getitem__
__len__
-
sigmoid
Borrowed from: https://www.kaggle.com/ziliwang/baseline-pytorch-bilstm
bestThresshold