Kaggle competition: Playground Series - Season 4, Episode 4
GOAL: predict the age of abalone from various physical measurements.
The evaluation metric for this competition is Root Mean Squared Logarithmic Error
.
PRIVATE SCORE: 0.14605 (Top 20% of leaderboard)
The features:
- Sex (categoacal): M, F, and I (infant)
- Length (continuous) - Longest shell measurement, mm
- Diameter (continuous) - perpendicular to length, mm
- Height (continuous) - with meat in shell, mm
- Whole weight (continuous) - whole abalone, grams
- Shucked weight (continuous) - weight of meat, grams
- Viscera weight (continuous) - gut weight (after bleeding), grams
- Shell weight (continuous) - after being dried, grams
- Rings (integer): +1.5 gives the age in years
Generated data was used for the competition. The training set was supplemented with original data.
Train RMSLE | Test RMSLE | Test R2 | |
---|---|---|---|
Linear Regression | 0.049806 | 0.049200 | 0.667240 |
Ridge Regression | 0.049806 | 0.049200 | 0.667239 |
Random Forest Regression | 0.017691 | 0.045806 | 0.716091 |
Bagging Regression | 0.020856 | 0.047807 | 0.690532 |
XGBoost Regression | 0.041198 | 0.045048 | 0.724207 |
LightGBM Regression | 0.044126 | 0.044750 | 0.727181 |
CatBoost Regression | 0.042644 | 0.044591 | 0.730209 |
CatBoost Regression with default parameters shows the best metrics on train data.
Creating an ensemble of different multihead-models: LightGBM x3
, XGBoost Regression x3
, CatBoost x3
are tuned with Optuna.
Than VotingRegressor is tuned by Optuna too.