The below information conveys strategies I used to build and evaluate a model to predict housing prices in Ames, Iowa as part of a Kaggle competition.
- Data Cleaning
a. Observe missing values
b. Dropping vs. Imputing
c. Deciding imputing technique
- EDA, Feature Engineering & Selection
a. Scatter plots
b. Correlations to determine importance of features
c. Fine tuning selected features using Lasso d. Outlier analysis e. Preprocessing (scaling data, train/test splits, transforming data)
- Modeling, Evaluation, Comparisons
a. Building models
b. Evaluating r2 scores
c. Evaluating RMSE
- Conclusion
The remainder of the notebook contains my code, visualizations, and analysis of the housing dataset as I attempt to build a predictive model that focuses on minimizing root mean squared error (RMSE).
The data is dervied from a 2011 housing set from Kaggle. The dataset is densely packed and contains granular housing information for homes in Ames, Iowa.
Visit https://www.kaggle.com/competitions/1113-ames-competition/data for a detailed Data Dictionary