We are required to build a regression model using regularisation in order to predict the actual value of the prospective properties and decide whether to invest in them or not.
The company wants to know:
- Which variables are significant in predicting the price of a house, and
- How well those variables describe the price of a house.
In this assignment, we will
- Use a hybrid combination of RFE and manual menthods for feature selection.
- build a linear regression model with ridge and lasso regularization for predicting 'SalePrice', which is the final selling price of a property.
- Find optimal regularization parameters for each of the methods using Grid search with K-Fold cross validation.
- Use R-squared score on the test set to evaluate our model
- Decide which model to go with.
Note that our main criterion of selecting a model would be
The most important features to determine the price of a property are:
- The overall material and finish of the house
- First Floor Area
- Second Floor Area
- Basement Area
- numpy - 1.23.1
- pandas - 1.4.3
- scikit-learn - 1.1.1
- The data is present in
train.csv
. - To understand the data, please read
data_description.txt
- Download the repository, making sure that
train.csv
andHousing Price Analysis.ipynb
are in the same folder. - Now you can run the whole Notebook (
BHousing Price Analysis.ipynb
) from top to bottom.
Created by [@showman-sharma] - feel free to contact me!