How to predict the onset of diabetes based on diagnostic measures of the Pima Indian Diabetes Dataset
Diabetes is a lifelong condition currently affecting approximately half a billion people worldwide, with an estimated increase of 51% by the year 2045 (Saeedi et al. 2019). It is primarily characterized by high level of blood glucose, which could also result in a cascade of other complications including hypertension, coronary heart diseases, stroke and other complications in kidney, feet and eyes (Whicher et al. 2020). Diabetes in the UK: 2019. Diabetic Medicine, 37(2), 242–247..
This dataset was extracted from Kaggle, originally from UCI Machine Learning Repository . The dataset consists of only females at least 21 years old of Pima Indian heritage. There are 8 predictor variables and 1 target variable (Outcome) as mentioned below with uniquely identified 768 observations having 268 positive for diabetes (1) and 500 negative for diabetes (0).
- Pregnancies: Number of times pregnant
- Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- BloodPressure: Diastolic blood pressure (mm Hg)
- SkinThickness: Triceps skin fold thickness (mm)
- Insulin: 2-Hour serum insulin (mu U/ml)
- BMI: Body mass index (weight in kg/(height in m)^2)
- DiabetesPedigreeFunction: Diabetes pedigree function
- Age: Age (years)
- Outcome: Class variable (0 or 1)