GitHub - sabyasachi-mukherjee/housing: Housing in California (1990)

Brief Description

I work with the housing in California (1990) dataset and try to evaluate three models that can help predict housing prices. The dataset is also available for download here (see "housing.csv").

Versions of packages used:

Pandas version: 1.4.2,

Numpy version: 1.21.5,

Matplotlib version: 3.5.1,

Sklearn version: 1.0.2

Methodology:

The three models I have used are linear regression, decision trees and random forests. I transform the training set using the sklearn pipeline, then run LinearRegressor(), RandomForestRegressor() and DecisionTreeRegressor(). After computing the root mean squared error (RMSE) for each and running 10-fold cross-validation, I settle on the Random Forest Regressor. Then I run GridSearchCV to obtain max_features = 8 and n_estimators = 30. Code in line 595 gives an idea of the importance of individual features in determining median house value, and we can drop certain unimportant features. Finally, I test my final_model against the test data and compute the RMSE score.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
housing.csv		housing.csv
housing.ipynb		housing.ipynb
model_dict.pkl		model_dict.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

sabyasachi-mukherjee/housing

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages