Jupyter Notebook https://jupyter.org/install is the only prerequisit to run the project.
This notebook targets in predicting sale price of houses based on the Ames Housing dataset. There are 79 features available which could be used for predicting the sale price. The dataset was obtained from Kaggle Housing Prices Competition https://www.kaggle.com/c/home-data-for-ml-course. The score obtained on the test data is among the first 2% of the Kaggle competition.
The housing.ipynb is the main file which includes the statistical analysis of the dataset and the prediction model. The data_description.txt includes the description of the different features of the dataset and the possible values in case of categorical features. The train.csv includes 80 columns (features plus sale price) while the test.csv includes only the features and is used to make predictions and then submit them to Kaggle.
Downloading the whole directory and running the Jupyter Notebook file housing.ipynb outputs a submission.csv file which could be submitted to Kaggle.
Many ideas on the project were taken from the following notebooks:
https://www.kaggle.com/artyomkolas/housing-prices-nanpredct-featurselect-top-1
https://www.kaggle.com/angqx95/data-science-workflow-top-2-with-tuning