Skip to content

Latest commit

 

History

History
87 lines (62 loc) · 5.79 KB

File metadata and controls

87 lines (62 loc) · 5.79 KB

Kaggle/Santander Product Recommendation



Abstract

Kaggle Santander Product Recommendation Competition

  • Host : Santander, British bank, wholly owned by the Spanish Santander Group.
  • Prize : $ 60,000
  • Problem : Multi-class Classification based Recommendation
  • Evaluation : MAP@7
  • Period : Oct 26 2016 ~ Dec 21 2016 (66 days)

Santander Bank offers a lending hand to their customers through personalized product recommendations. In their second competition, Santander is challenging Kagglers to predict which products their existing customers will use in the next month based on their past behavior and that of similar customers.

Competition data consists of customer data from 2015-01 ~ 2016-05 (total of 17 month timestamps) including customer's demographic information and their product purchase behavior. Competition challenges you to predict top 7 products out of 24, that each customer in the test data is most likely to purchase on 2016-06.

Evaluation metric is in MAP@7, which made the direct optimization difficult during training phase. Instead, the mlogloss was widely used among kagglers to indirectly optimize the solution.

With BreakfastPirates generous sharing, using 2015-06 data-only as a training data seemed to perform pretty well in the leaderboard (reaching almost ~0.03). Single model performance was enough to place you on top of the leaderboard, since MAP@7 made the effect of ensemble relatively weak.

As always, feature engineering seemed to be the most important factor in this competition, along with good cv scheme to reach the best hyper-parameter that squeezes the performance from the given data.

Result

Submission CV LogLoss Public LB Rank Private LB Rank
bare_minimum 1.84515 - - 0.0165546 1406
reduced version by kweonwooj 0.9492806 - - 0.0302238 208
best single model by kweonwooj 0.9396864 0.029975 182 0.0302794 175
reproduced version of 8th place solution 0.885272 - - 0.0309659 14

reproduced version of 8th place solution is a direct fork from GitHub by Alexander Ponomarchuk and sh1ng. I added personal comments and a execution log. All credits go to the producers.

How to Run

[Data]

Place data in root_input directory. You can download data from here.

[Code]

Above results can be replicated by runinng

python code/main.py

for each of the directories.

Make sure you are on Python 3.5.2 with library versions same as specified in requirements.txt

[Submit]

Submit the resulting csv file here and verify the score.

Expected Result

for bare minimum



for reduced version of kweonwooj



for reproduced version of 8th place



Winnig Solutions

  • 1st place solution on Forum by idle_speculation
  • 2nd place solution on Forum, GitHub, Kaggle Blog by Tom Van de Wiele
  • 3rd place solution on Forum by Jack (Japan)
  • 4th place solution on Forum by yoniko
  • 5th place solution on Forum by BreakfastPirate, on Forum, GitHub by Jared Turkewitz
  • 7th place solution on Forum by Evgeny Patekha
  • 8th place solution on Forum, GitHub by Alexander Ponomarchuk and sh1ng
  • 9th place solution on Forum by raddar and Davut Polat
  • 11th place solution on Forum, GitHub by SRK and Rohan Rao
  • 13th place solution on Forum by Sameh Faidi
  • 14th place solution on Forum by alijs
  • 20th place solution on Forum, GitHub by Alan (AJ) Pryor, Jr. and Matt Mills