Obesity Estimation with Linear Regression

This repository contains a custom implementation of a linear regression model for estimating obesity levels based on specific attributes. The model is compared to the linear regression model provided by Scikit-learn using an obesity dataset.

Overview

The dataset used is Estimation of obesity levels based on eating habits and physical condition, which can be found here. More information regarding the dataset are found here.
The features used from the dataset are: Gender, Age, Height, Weight, Family history, Frequency of high caloric food, Smoking, Water intake, Monitoring calories, Physical activity and Alcohol consumption.
This custom implementation of linear regression includes the gradient descent optimization algorithm, which iteratively updates model parameters to minimize the cost function and improve predictive performance. It also includes a function for finding the best learning rate.

Features

Custom implementation of linear regression
Feature engineering and preprocessing
Model comparison with Scikit-learn's linear regression
Data visualization and analysis

Dependencies

Python 3.x
NumPy
pandas
Matplotlib
Scikit-learn

Data visualization

Coeficients comparison

Analysis of Feature 4 Coefficient - Weight

The coefficient for Feature 4 in the custom model is substantially higher compared to the coefficient obtained from Scikit-learn's model.
This indicates that in the custom model, Feature 4 has a much stronger positive impact on the predicted obesity levels.
It suggests that in the context of the custom model, changes in Feature 4 have a greater influence on the predicted obesity levels compared to other features.

Possible Explanations:

Feature Engineering Differences: it's possible that there are differences in how Feature 4 is engineered or preprocessed in the custom model compared to Scikit-learn's model. Differences in scaling, normalization, or encoding methods could lead to variations in the coefficient values.
Modeling Assumptions: the custom model may make different assumptions or have different underlying mathematical formulations compared to Scikit-learn's model, leading to variations in coefficient values.
Overfitting or Underfitting: differences in coefficient values could also be attributed to overfitting or underfitting of the models. The custom model may be overfitting to the training data, resulting in inflated coefficient values for certain features.

Results

The custom linear regression model achieved comparable results to Scikit-learn's linear regression model on the obesity dataset. The model's performance was evaluated using metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE).

Most Important Attribute for Predicting Obesity

In the custom linear regression model, the most important attribute for predicting obesity levels was Weight. However, in Scikit-learn's linear regression model, the most significant attribute was Family history with obesity. This distinction highlights the importance of different features in predicting obesity levels and underscores the value of feature analysis and selection in model development.

Least Impactful Attribute for Preventing Obesity

In the custom linear regression model, the least important attribute for predicting obesity levels was Height. This suggests that as the height decreases, the predicted obesity level tends to increase. In other words, shorter individuals may have a higher predicted obesity level compared to taller individuals in the dataset.
In Scikit-learn's linear regression model, the least significant attribute was Monitoring Calories. This suggests that individuals who monitor their calorie intake less tend to have higher predicted obesity levels. It implies that being less vigilant about calorie consumption is associated with a higher likelihood of obesity in the dataset.

It's important to note that these interpretations are based on the specific dataset and model used. Real-world relationships may be influenced by various factors, and causality cannot be inferred solely based on regression coefficients. Further analysis and domain knowledge are necessary to validate these findings.

Possible Improvements

Explore more advanced feature selection methods
Handling Outliers and Missing Data More Effectively
Regularization and Model Complexity Control
Incorporating Domain Knowledge or Additional Data Sources
Hyperparameter Tuning for Model Optimization

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Obesity Estimation with Linear Regression

Table of Contents

Overview

Features

Dependencies

Data visualization

Coeficients comparison

Analysis of Feature 4 Coefficient - Weight

Possible Explanations:

Results

Most Important Attribute for Predicting Obesity

Least Impactful Attribute for Preventing Obesity

Possible Improvements

About

Releases

Packages

Languages

RazvanGolan/ObesityPredictor

Folders and files

Latest commit

History

Repository files navigation

Obesity Estimation with Linear Regression

Table of Contents

Overview

Features

Dependencies

Data visualization

Coeficients comparison

Analysis of Feature 4 Coefficient - Weight

Possible Explanations:

Results

Most Important Attribute for Predicting Obesity

Least Impactful Attribute for Preventing Obesity

Possible Improvements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages