Predicting the Cooling Load of Buildings

Table of Content

Introduction
Data Gathering
Data Exploration
Tools Used
Model Building
Evaluation Metrics
Model Selection
Conclusion

Introduction

The energy efficiency of buildings is becoming an increasingly important issue, both from an environmental and economic perspective. In this project, I focus on using machine learning to predict the cooling load requirements of buildings as a function of building parameters. Specifically, I aim to use eight building parameters to predict the cooling load. The dataset used for this project is obtained from the UCI Machine Learning repository and contains a total of 768 observations. The dataset is explored, preprocessed, and then used to train and evaluate various machine learning models. The primary objective of this project is to build a robust predictive model for cooling load requirements, which can aid in the design and development of energy-efficient buildings.

Data Gathering

The dataset was obtained from the UCI Machine learning repository link. The dataset includes 768 samples and 9 variables (8 features and 1 target variable). The target variable in this project is the cooling load of a building, and the eight features represent various building parameters. All variables are continuous, and there are no missing values in the dataset.

Variable	Attributes
X1	Relative Compactness
X2	Surface Area
X3	Wall Area
X4	Roof Area
X5	Overall Height
X6	Orientation
X7	Glazing Area
X8	Glazing Area Distribution
Y1	Heating Load
Y2	Cooling Load

Data Exploration

Boxplot: I plotted boxplots for the three groups of independent variables, i.e. Surface Area, Wall Area, and Roof Area; Overall Height, Orientation, and Glazing Area Distribution; and Glazing Area and Relative Compactness.I also plotted a boxplot for the dependent variable Cooling Load. The boxplots show the distribution of the data for each variable and help to identify any potential outliers.
Correlation Heatmap: I created a correlation heatmap to visualize the pairwise correlations between the features. I found that the Relative Compactness, Surface Area, Wall Area, Roof Area, and Overall Height have strong negative correlations with the Cooling Load. Glazing Area has a strong positive correlation with the Cooling Load, while Orientation and Glazing Area Distribution have weak correlations with Cooling Load.
Pairplot: I created a pairplot to visualize the relationships between all pairs of features. This also helps to identify any potential outliers and identify any possible nonlinear relationships between the variables.

Tools Used

The project was done on the Jupyter Notebook environment, and the required packages and libraries to run this project include:

NumPy
Pandas
Scikit-learn
XGBoost
Matplotlib.pyplot
Seaborn

Model Building

The data was splited into training and testing sets. Where the training set represents 70% and the testing set represents 30% of the dataset.
Train and evaluate the performance of nine different machine learning models:

Linear Regression
XG Boost Gegressor
Gradient Boosting Regressor
Random Forest Regressor
Lasso
Decision Tree Regressor
Ridge
AdaBoost Regressor
Bagging Regressor

These models were chosen because they are commonly used in regression problems and have proven to be effective in previous studies. All models were trained using the 70/30 train-test split of the data. The models were optimized using the mean squared error (MSE) and R-squared (R2) score as evaluation metrics.

Evaluation Metrics

The models are evaluated using the following metrics:

Root Mean Squared Error RMSE
R-squared R2

Model Selection

Out of the 9 models, the best-performing models were optimized by using Grid Search Cross-Validation. Five different models were selected, and the performance of each model was visualized using a custom function model_perf_visual() that compares the actual and predicted values. The five models used were:

Gradient Boosting Regressor

Random Forest Regressor

XG Boost Regressor

Bagging Regressor

AdaBoost Regressor

Conclusion

In conclusion, I explored using machine learning to assess the cooling load requirements of buildings. From the results, it is clear that the ensemble methods such as Random Forest Regressor, AdaBoost Regressor, and Gradient Boosting Regressor models performed better than the other models. The Gradient Boosting Regressor model has the best performance with a RMSE score of 0.80 and R2 score of 0.99. This makes the Gradient Boosting Regressor the best model for predicting the cooling load of buildings using the energy efficiency dataset from UCI Machine Learning repository.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
AdaBoost.png		AdaBoost.png
Bagging.png		Bagging.png
ENB2012_data.xlsx		ENB2012_data.xlsx
Gradient Boosting.png		Gradient Boosting.png
LICENSE		LICENSE
Predicting the Cooling Load of Buildings.ipynb		Predicting the Cooling Load of Buildings.ipynb
README.md		README.md
Random Forest.png		Random Forest.png
XG Boosting.png		XG Boosting.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation