Long ago, in the distant, fragrant mists of time, there was a competition… It was not just any competition.
It was a competition that challenged mere mortals to model a 20,000x200 matrix of continuous variables using only 250 training samples… without overfitting.
Data scientists ― including Kaggle's very own Will Cukierski ― competed by the hundreds. Legends were made. (Will took 5th place, and eventually ended up working at Kaggle!) People overfit like crazy. It was a Kaggle-y, data science-y madhouse.
So… we're doing it again.
Don't Overfit II: The Overfittening This is the next logical step in the evolution of weird competitions. Once again we have 20,000 rows of continuous variables, and a mere handful of training samples. Once again, we challenge you not to overfit. Do your best, model without overfitting, and add, perhaps, to your own legend.
A simple attempt at modelling the data without any complex data preprocessing.It gave an accuracy of 84.7% on the Leaderboard. Email me if you have any doubts.