Skip to content

RazvanGolan/ObesityPredictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Obesity Estimation with Linear Regression

This repository contains a custom implementation of a linear regression model for estimating obesity levels based on specific attributes. The model is compared to the linear regression model provided by Scikit-learn using an obesity dataset.

Table of Contents

Overview

  • The dataset used is Estimation of obesity levels based on eating habits and physical condition, which can be found here. More information regarding the dataset are found here.
  • The features used from the dataset are: Gender, Age, Height, Weight, Family history, Frequency of high caloric food, Smoking, Water intake, Monitoring calories, Physical activity and Alcohol consumption.
  • This custom implementation of linear regression includes the gradient descent optimization algorithm, which iteratively updates model parameters to minimize the cost function and improve predictive performance. It also includes a function for finding the best learning rate.

Features

  • Custom implementation of linear regression
  • Feature engineering and preprocessing
  • Model comparison with Scikit-learn's linear regression
  • Data visualization and analysis

Dependencies

  • Python 3.x
  • NumPy
  • pandas
  • Matplotlib
  • Scikit-learn

Data visualization

data_visiualisation

Coeficients comparison

Screenshot 2024-02-25 at 16 50 38 Screenshot 2024-02-25 at 15 46 24

Analysis of Feature 4 Coefficient - Weight

  • The coefficient for Feature 4 in the custom model is substantially higher compared to the coefficient obtained from Scikit-learn's model.
  • This indicates that in the custom model, Feature 4 has a much stronger positive impact on the predicted obesity levels.
  • It suggests that in the context of the custom model, changes in Feature 4 have a greater influence on the predicted obesity levels compared to other features.

Possible Explanations:

  • Feature Engineering Differences: it's possible that there are differences in how Feature 4 is engineered or preprocessed in the custom model compared to Scikit-learn's model. Differences in scaling, normalization, or encoding methods could lead to variations in the coefficient values.
  • Modeling Assumptions: the custom model may make different assumptions or have different underlying mathematical formulations compared to Scikit-learn's model, leading to variations in coefficient values.
  • Overfitting or Underfitting: differences in coefficient values could also be attributed to overfitting or underfitting of the models. The custom model may be overfitting to the training data, resulting in inflated coefficient values for certain features.

Results

The custom linear regression model achieved comparable results to Scikit-learn's linear regression model on the obesity dataset. The model's performance was evaluated using metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE).

Screenshot 2024-02-25 at 16 53 39

Most Important Attribute for Predicting Obesity

  • In the custom linear regression model, the most important attribute for predicting obesity levels was Weight. However, in Scikit-learn's linear regression model, the most significant attribute was Family history with obesity. This distinction highlights the importance of different features in predicting obesity levels and underscores the value of feature analysis and selection in model development.

Least Impactful Attribute for Preventing Obesity

  • In the custom linear regression model, the least important attribute for predicting obesity levels was Height. This suggests that as the height decreases, the predicted obesity level tends to increase. In other words, shorter individuals may have a higher predicted obesity level compared to taller individuals in the dataset.
  • In Scikit-learn's linear regression model, the least significant attribute was Monitoring Calories. This suggests that individuals who monitor their calorie intake less tend to have higher predicted obesity levels. It implies that being less vigilant about calorie consumption is associated with a higher likelihood of obesity in the dataset.

It's important to note that these interpretations are based on the specific dataset and model used. Real-world relationships may be influenced by various factors, and causality cannot be inferred solely based on regression coefficients. Further analysis and domain knowledge are necessary to validate these findings.

Possible Improvements

  • Explore more advanced feature selection methods
  • Handling Outliers and Missing Data More Effectively
  • Regularization and Model Complexity Control
  • Incorporating Domain Knowledge or Additional Data Sources
  • Hyperparameter Tuning for Model Optimization

About

Predicting obesity with liniar regression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages