Kaggle/Santander Product Recommendation

Abstract

Kaggle Santander Product Recommendation Competition

Host : Santander, British bank, wholly owned by the Spanish Santander Group.
Prize : $ 60,000
Problem : Multi-class Classification based Recommendation
Evaluation : MAP@7
Period : Oct 26 2016 ~ Dec 21 2016 (66 days)

Santander Bank offers a lending hand to their customers through personalized product recommendations. In their second competition, Santander is challenging Kagglers to predict which products their existing customers will use in the next month based on their past behavior and that of similar customers.

Competition data consists of customer data from 2015-01 ~ 2016-05 (total of 17 month timestamps) including customer's demographic information and their product purchase behavior. Competition challenges you to predict top 7 products out of 24, that each customer in the test data is most likely to purchase on 2016-06.

Evaluation metric is in MAP@7, which made the direct optimization difficult during training phase. Instead, the mlogloss was widely used among kagglers to indirectly optimize the solution.

With BreakfastPirates generous sharing, using 2015-06 data-only as a training data seemed to perform pretty well in the leaderboard (reaching almost ~0.03). Single model performance was enough to place you on top of the leaderboard, since MAP@7 made the effect of ensemble relatively weak.

As always, feature engineering seemed to be the most important factor in this competition, along with good cv scheme to reach the best hyper-parameter that squeezes the performance from the given data.

Result

Submission	CV LogLoss	Public LB	Rank	Private LB	Rank
bare_minimum	1.84515	-	-	0.0165546	1406
reduced version by kweonwooj	0.9492806	-	-	0.0302238	208
best single model by kweonwooj	0.9396864	0.029975	182	0.0302794	175
reproduced version of 8th place solution	0.885272	-	-	0.0309659	14

reproduced version of 8th place solution is a direct fork from GitHub by Alexander Ponomarchuk and sh1ng. I added personal comments and a execution log. All credits go to the producers.

How to Run

[Data]

Place data in root_input directory. You can download data from here.

[Code]

Above results can be replicated by runinng

python code/main.py

for each of the directories.

Make sure you are on Python 3.5.2 with library versions same as specified in requirements.txt

[Submit]

Submit the resulting csv file here and verify the score.

Expected Result

for bare minimum

for reduced version of kweonwooj

for reproduced version of 8th place

Winnig Solutions

1st place solution on Forum by idle_speculation
2nd place solution on Forum, GitHub, Kaggle Blog by Tom Van de Wiele
3rd place solution on Forum by Jack (Japan)
4th place solution on Forum by yoniko
5th place solution on Forum by BreakfastPirate, on Forum, GitHub by Jared Turkewitz
7th place solution on Forum by Evgeny Patekha
8th place solution on Forum, GitHub by Alexander Ponomarchuk and sh1ng
9th place solution on Forum by raddar and Davut Polat
11th place solution on Forum, GitHub by SRK and Rohan Rao
13th place solution on Forum by Sameh Faidi
14th place solution on Forum by alijs
20th place solution on Forum, GitHub by Alan (AJ) Pryor, Jr. and Matt Mills

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Kaggle/Santander Product Recommendation

Abstract

Result

How to Run

Expected Result

Winnig Solutions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Kaggle/Santander Product Recommendation

Abstract

Result

How to Run

Expected Result

Winnig Solutions