The objective of this project was to develop a machine learning model capable of predicting whether a person has diabetes based on a set of medical variables.
- Data collection
- Data preprocessing and cleaning
- Exploratoy data analysis (EDA)
- Modeling and evaluation
- Models
- Logistic Regression
- K-Nearest Neighbors
- Support Vector Machine
- Evaluation
- F1-Score
- Models
- Is it possible to accurately predict whether a patient has diabetes using diagnostic variables such as number of pregnancies, BMI, insulin levels and age?
- Random Forest emerged as the best-performing model, achieving an F1-Score of 0.7234.
- The EDA revealed no strong correlation between the number of pregnancies and diabetes outcome.
- Glucose levels and BMI had a stronger relationship with the target variable.
UCI Machine Learning & Collaborator. (n.d.). Pima Indians Diabetes Database. Kaggle. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
José Habacuc Soto Hernández - SWE Student
- GitHub: https://github.com/habacucsoto
- Portfolio: https://habacuc.dev