This dataset offers a detailed collection of house listings from various cities and regions in Bangladesh, with a specific focus on Dhaka and Chittagong. It encompasses essential details such as location, property type, size, amenities, and pricing. The dataset is consistently updated to provide an accurate and current portrayal of the housing market in Bangladesh.
The dataset caters to diverse purposes, serving as a valuable resource for researchers, data scientists, real estate professionals, and investors. Some key use cases include:
- Analyzing regional price trends
- Identifying popular neighborhoods and amenities
- Training machine learning models for predicting housing prices
The following regression models have been applied to analyze and predict housing prices using this dataset:
-
Linear Regression:
- A fundamental model assuming a linear relationship between input features and housing prices.
-
XGBRegressor (Extreme Gradient Boosting):
- A powerful gradient boosting algorithm known for its speed and performance, applied specifically for regression tasks.
-
LGBMRegressor (LightGBM):
- Another gradient boosting framework, recognized for efficiency and speed, utilized for regression on this dataset.
-
Random Forest:
- A versatile ensemble learning model that leverages multiple decision trees to make predictions, often robust and effective for various datasets.
The dataset is structured with the following columns:
- Location: The geographical location of the property.
- Property Type: Categorization of the property (e.g., apartment, house).
- Size: Size or area of the property.
- Amenities: Features and facilities associated with the property.
- Price: The listed price of the property.
Before applying the regression models, the dataset underwent the following preprocessing steps:
-
Handling Missing Data:
- Any missing data in crucial columns was addressed through imputation or removal.
-
Encoding Categorical Variables:
- Categorical variables like "Property Type" were encoded to make them suitable for the regression models.
-
Feature Scaling:
- To ensure consistent model performance, numerical features were scaled.
-
Dataset Access:
- Download the dataset in CSV format for your analysis.
-
Run the Models:
- Explore the implementation of Linear Regression, Random Forest, XGBRegressor, and LGBMRegressor in the notebook here.
-
Customization:
- Customize models or dataset features based on specific research questions or objectives.
- Heterogeneity: The dataset may exhibit variations in property listings, requiring careful consideration during analysis.
- Outliers: Addressing outliers in pricing or property size may impact model performance.
If you encounter issues or have suggestions for improvement, please open an issue or submit a pull request.
Happy analyzing!