Skip to content

sabyasachi-mukherjee/housing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brief Description

I work with the housing in California (1990) dataset and try to evaluate three models that can help predict housing prices. The dataset is also available for download here (see "housing.csv").

Versions of packages used:

Pandas version: 1.4.2,

Numpy version: 1.21.5,

Matplotlib version: 3.5.1,

Sklearn version: 1.0.2

Methodology:

The three models I have used are linear regression, decision trees and random forests. I transform the training set using the sklearn pipeline, then run LinearRegressor(), RandomForestRegressor() and DecisionTreeRegressor(). After computing the root mean squared error (RMSE) for each and running 10-fold cross-validation, I settle on the Random Forest Regressor. Then I run GridSearchCV to obtain max_features = 8 and n_estimators = 30. Code in line 595 gives an idea of the importance of individual features in determining median house value, and we can drop certain unimportant features. Finally, I test my final_model against the test data and compute the RMSE score.

About

Housing in California (1990)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published