For our PAI course project, we are building several disease prediction systems, including heart disease, diabetes, Parkinson's, and breast cancer classification. Using machine learning algorithms, we aim to analyze patient data and improve the accuracy of early diagnosis, providing valuable insights to healthcare professionals.
This project is a machine learning application to predict diabetes using the Pima Indian Diabetes Dataset. It includes a Tkinter-based GUI for individual and batch predictions, and features a Login and Registration system to manage user access securely.
-
Authentication System:
- User Registration with secure password hashing.
- Login functionality with input validation.
- Encrypted user credentials storage.
-
Machine Learning Models:
- Random Forest Classifier
- K-Nearest Neighbors (KNN)
- Ensemble Voting Classifier (Soft Voting)
-
Exploratory Data Analysis (EDA):
- Visualizations (Histograms, Boxplots, Pairplots, Correlation Heatmap)
-
GUI:
- Predict individual diabetes outcomes.
- Batch prediction from CSV files.
- Light/Dark themes.
- Validation of inputs.
- Save prediction results.
-
Model Persistence: Save and load trained models with
joblib
.
- Installation
- Authentication System
- Dataset
- Exploratory Data Analysis (EDA)
- Machine Learning Workflow
- Tkinter GUI Application
- Results
- Usage
- Contributing
- License
-
Clone the repository:
git clone https://github.com/iamrayyan1/PAI-Project.git
-
Navigate to the project directory:
cd PAI-Project
-
Install the required libraries:
pip install -r requirements.txt
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # For Linux/macOS venv\Scripts\activate # For Windows
Before using the prediction system, users must register and log in through a secure interface. This ensures that only authorized users can access the application.
-
Register a new user:
- Enter your desired username and password.
- Passwords are stored securely using hashing.
-
Login:
- Enter your registered credentials to access the main application.
- Name: Pima Indian Diabetes Dataset
- Description: This dataset contains medical diagnostic measurements for predicting the onset of diabetes based on specific factors. It has 768 entries with 8 features and a target variable (
Outcome
).
-
Data Preprocessing:
- Handle missing values using median imputation.
- Address outliers using the IQR method.
- Scale features using
StandardScaler
.
-
Class Imbalance Handling:
- Applied SMOTETomek for oversampling the minority class and undersampling the majority class.
-
Model Training:
- Models Used:
- Random Forest Classifier
- K-Nearest Neighbors (KNN)
- Ensemble Voting Classifier
- Train-Test Split: 80%-20% with stratification.
- Models Used:
-
Model Evaluation:
- Metrics:
- Confusion Matrix
- Accuracy
- Classification Report
- ROC-AUC Score
- Metrics:
The Tkinter-based GUI provides an intuitive interface for predictions.
- Individual Predictions: Enter patient data manually.
- Batch Predictions: Load and predict multiple records from a CSV file.
- Themes: Switch between Light/Dark themes.
- Help Section: Tooltips for input fields and detailed instructions.
Metric | Score |
---|---|
Accuracy | 86% |
ROC-AUC Score | 92% |
-
Train and save the model:
https://github.com/iamrayyan1/PAI-Project/blob/main/diabetes_model.py
-
To view EDA/Graphs:
https://github.com/iamrayyan1/PAI-Project/blob/main/graphs.py
- Run the GUI:
https://github.com/iamrayyan1/PAI-Project/blob/main/final.py
---
## Contributing
We welcome contributions! Please follow these steps:
1. Fork the repository.
2. Create a new branch
3. Commit your changes.
4. Push to the branch.
5. Open a Pull Request.
---
## License
This project is licensed under the Creative Commons Legal Code License. See the `LICENSE` file for details.