Skip to content

The code below is for a machine learning project that builds a regression model using different algorithms and evaluates their performance.

License

Notifications You must be signed in to change notification settings

Paul-Asamoah-Boadu/Predicting-the-Cooling-Load-of-Buildings

Repository files navigation

Predicting the Cooling Load of Buildings

Table of Content

Introduction

The energy efficiency of buildings is becoming an increasingly important issue, both from an environmental and economic perspective. In this project, I focus on using machine learning to predict the cooling load requirements of buildings as a function of building parameters. Specifically, I aim to use eight building parameters to predict the cooling load. The dataset used for this project is obtained from the UCI Machine Learning repository and contains a total of 768 observations. The dataset is explored, preprocessed, and then used to train and evaluate various machine learning models. The primary objective of this project is to build a robust predictive model for cooling load requirements, which can aid in the design and development of energy-efficient buildings.

Data Gathering

The dataset was obtained from the UCI Machine learning repository link. The dataset includes 768 samples and 9 variables (8 features and 1 target variable). The target variable in this project is the cooling load of a building, and the eight features represent various building parameters. All variables are continuous, and there are no missing values in the dataset.

Variable Attributes
X1 Relative Compactness
X2 Surface Area
X3 Wall Area
X4 Roof Area
X5 Overall Height
X6 Orientation
X7 Glazing Area
X8 Glazing Area Distribution
Y1 Heating Load
Y2 Cooling Load

Data Exploration

  • Boxplot: I plotted boxplots for the three groups of independent variables, i.e. Surface Area, Wall Area, and Roof Area; Overall Height, Orientation, and Glazing Area Distribution; and Glazing Area and Relative Compactness.I also plotted a boxplot for the dependent variable Cooling Load. The boxplots show the distribution of the data for each variable and help to identify any potential outliers.

  • Correlation Heatmap: I created a correlation heatmap to visualize the pairwise correlations between the features. I found that the Relative Compactness, Surface Area, Wall Area, Roof Area, and Overall Height have strong negative correlations with the Cooling Load. Glazing Area has a strong positive correlation with the Cooling Load, while Orientation and Glazing Area Distribution have weak correlations with Cooling Load.

  • Pairplot: I created a pairplot to visualize the relationships between all pairs of features. This also helps to identify any potential outliers and identify any possible nonlinear relationships between the variables.

Tools Used

The project was done on the Jupyter Notebook environment, and the required packages and libraries to run this project include:

  • NumPy
  • Pandas
  • Scikit-learn
  • XGBoost
  • Matplotlib.pyplot
  • Seaborn

Model Building

The data was splited into training and testing sets. Where the training set represents 70% and the testing set represents 30% of the dataset.
Train and evaluate the performance of nine different machine learning models:

  1. Linear Regression
  2. XG Boost Gegressor
  3. Gradient Boosting Regressor
  4. Random Forest Regressor
  5. Lasso
  6. Decision Tree Regressor
  7. Ridge
  8. AdaBoost Regressor
  9. Bagging Regressor

These models were chosen because they are commonly used in regression problems and have proven to be effective in previous studies. All models were trained using the 70/30 train-test split of the data. The models were optimized using the mean squared error (MSE) and R-squared (R2) score as evaluation metrics.

Evaluation Metrics

The models are evaluated using the following metrics:

  • Root Mean Squared Error RMSE
  • R-squared R2

Model Selection

Out of the 9 models, the best-performing models were optimized by using Grid Search Cross-Validation. Five different models were selected, and the performance of each model was visualized using a custom function model_perf_visual() that compares the actual and predicted values. The five models used were:

Gradient Boosting Regressor

Visualization of test and train of Gradient Boosting Regressor

Random Forest Regressor

Visualization of test and train of Random Forest Regressor

XG Boost Regressor

Visualization of test and train of XG Booting Regressor

Bagging Regressor

Visualization of test and train of Bagging Regressor

AdaBoost Regressor

Visualization of test and train of AdaBoost Regressor

Conclusion

In conclusion, I explored using machine learning to assess the cooling load requirements of buildings. From the results, it is clear that the ensemble methods such as Random Forest Regressor, AdaBoost Regressor, and Gradient Boosting Regressor models performed better than the other models. The Gradient Boosting Regressor model has the best performance with a RMSE score of 0.80 and R2 score of 0.99. This makes the Gradient Boosting Regressor the best model for predicting the cooling load of buildings using the energy efficiency dataset from UCI Machine Learning repository.

Releases

No releases published

Packages

No packages published