This project analyzes housing prices in the Philippines, focusing on the relationship between price and other features, while going through phases of data cleaning, data analysis, and model creation. The project aims to build a simple regression model to predict housing prices based on floor area and other available features.
The project is divided into three main notebooks, each focusing on a specific part of the data science process:
- Data Cleaning: Preparing and processing the dataset for analysis.
- Data Analysis: Exploring the dataset to understand trends and insights.
- Model Creation: Building and evaluating a regression model to predict housing prices.
The dataset used for analysis and training the model consists of housing listings from the Philippines in CSV format. The CSV files are organized into two directories:
- datasets/: This directory contains all dataset files.
- raw/: Contains the original CSV files:
- Housing Price Imputed.csv: Original data with imputed values for missing entries.
- Housing Price Philippines Lamudi.csv: Raw housing price listings from Lamudi.
- processed/: Contains the cleaned dataset:
- cleaned_housing_prices.csv: The CSV file exported after data cleaning, ready for analysis and modeling.
- raw/: Contains the original CSV files:
You can access the dataset on Kaggle here: https://www.kaggle.com/datasets/linkanjarad/housing-prices-dataset-philippines