- Introduction
- Data Gathering
- Data Exploration
- Tools Used
- Model Building
- Evaluation Metrics
- Model Selection
- Conclusion
The energy efficiency of buildings is becoming an increasingly important issue, both from an environmental and economic perspective. In this project, I focus on using machine learning to predict the cooling load requirements of buildings as a function of building parameters. Specifically, I aim to use eight building parameters to predict the cooling load. The dataset used for this project is obtained from the UCI Machine Learning repository and contains a total of 768 observations. The dataset is explored, preprocessed, and then used to train and evaluate various machine learning models. The primary objective of this project is to build a robust predictive model for cooling load requirements, which can aid in the design and development of energy-efficient buildings.
The dataset was obtained from the UCI Machine learning repository link. The dataset includes 768 samples and 9 variables (8 features and 1 target variable). The target variable in this project is the cooling load of a building, and the eight features represent various building parameters. All variables are continuous, and there are no missing values in the dataset.
Variable | Attributes |
---|---|
X1 | Relative Compactness |
X2 | Surface Area |
X3 | Wall Area |
X4 | Roof Area |
X5 | Overall Height |
X6 | Orientation |
X7 | Glazing Area |
X8 | Glazing Area Distribution |
Y1 | Heating Load |
Y2 | Cooling Load |
-
Boxplot: I plotted boxplots for the three groups of independent variables, i.e.
Surface Area
,Wall Area
, andRoof Area
;Overall Height
,Orientation
, andGlazing Area Distribution
; andGlazing Area
andRelative Compactness
.I also plotted a boxplot for the dependent variableCooling Load
. The boxplots show the distribution of the data for each variable and help to identify any potential outliers. -
Correlation Heatmap: I created a correlation heatmap to visualize the pairwise correlations between the features. I found that the
Relative Compactness
,Surface Area
,Wall Area
,Roof Area
, andOverall Height
have strong negative correlations with the Cooling Load.Glazing Area
has a strong positive correlation with the Cooling Load, whileOrientation
andGlazing Area Distribution
have weak correlations with Cooling Load. -
Pairplot: I created a pairplot to visualize the relationships between all pairs of features. This also helps to identify any potential outliers and identify any possible nonlinear relationships between the variables.
The project was done on the Jupyter Notebook environment, and the required packages and libraries to run this project include:
- NumPy
- Pandas
- Scikit-learn
- XGBoost
- Matplotlib.pyplot
- Seaborn
The data was splited into training and testing sets. Where the training set represents 70% and the testing set represents 30% of the dataset.
Train and evaluate the performance of nine different machine learning models:
- Linear Regression
- XG Boost Gegressor
- Gradient Boosting Regressor
- Random Forest Regressor
- Lasso
- Decision Tree Regressor
- Ridge
- AdaBoost Regressor
- Bagging Regressor
These models were chosen because they are commonly used in regression problems and have proven to be effective in previous studies. All models were trained using the 70/30 train-test split of the data. The models were optimized using the mean squared error (MSE) and R-squared (R2) score as evaluation metrics.
The models are evaluated using the following metrics:
- Root Mean Squared Error
RMSE
- R-squared
R2
Out of the 9 models, the best-performing models were optimized by using Grid Search Cross-Validation. Five different models were selected, and the performance of each model was visualized using a custom function model_perf_visual()
that compares the actual and predicted values. The five models used were:
In conclusion, I explored using machine learning to assess the cooling load requirements of buildings. From the results, it is clear that the ensemble methods such as Random Forest Regressor
, AdaBoost Regressor
, and Gradient Boosting Regressor
models performed better than the other models. The Gradient Boosting Regressor
model has the best performance with a RMSE score of 0.80 and R2 score of 0.99. This makes the Gradient Boosting Regressor
the best model for predicting the cooling load of buildings using the energy efficiency dataset from UCI Machine Learning repository.