-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME
19 lines (11 loc) · 1.43 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
9th (public) place solution to MeLi Data Challenge 2020
This is a very simple solution:
The most important model is XGBoost. Stacking with the Neural Network only (barely) flipped my place from 10th to 9th.
1. Run 0_parquet.ipynb to save the original files as parquet and make the loading faster.
2. Run 1a_prep_sbert_neuralmind.ipynb to generate sentence embeddings (using a PT-BR fine-tuned BERT provided by neuralmind) and a KNN index based on this data.
3. Run 1b_prep_ltr_knn_search.ipynb to "melt" the original data and add nearest neighbors. Basically create one row for each candidate item (viewed items + 50 nearest neighbors based on both views and search embeddings from last step)
4. Run 2a_xgb_ranker_knn_neuralmind.ipynb to create a minimal feature set, transform the target into a ranking, save the data for reuse and train a rank:pairwise XGBoost.
5. Run 2b_embbag_nums_yrank_mse.ipynb to create a neural network that takes both features from the previous dataset and the sentence embeddings. To be faster I trained it over the same target, but using MSE (surprisingly not as bad as I thought).
6. Run 3_stack.ipynb to load the previous models predictions and create a XGB to stack them into final predictions.
Subs are named 22c, 26, etc because these were the original notebook names as I was naming them in a sequence to organize the progress.
Thanks for organizing this competition and preparing a very practical, real-world dataset :)