PAI-Project

For our PAI course project, we are building several disease prediction systems, including heart disease, diabetes, Parkinson's, and breast cancer classification. Using machine learning algorithms, we aim to analyze patient data and improve the accuracy of early diagnosis, providing valuable insights to healthcare professionals.

Diabetes Prediction System using Pima Indian Dataset

This project is a machine learning application to predict diabetes using the Pima Indian Diabetes Dataset. It includes a Tkinter-based GUI for individual and batch predictions, and features a Login and Registration system to manage user access securely.

Features

Authentication System:
- User Registration with secure password hashing.
- Login functionality with input validation.
- Encrypted user credentials storage.
Machine Learning Models:
- Random Forest Classifier
- K-Nearest Neighbors (KNN)
- Ensemble Voting Classifier (Soft Voting)
Exploratory Data Analysis (EDA):
- Visualizations (Histograms, Boxplots, Pairplots, Correlation Heatmap)
GUI:
- Predict individual diabetes outcomes.
- Batch prediction from CSV files.
- Light/Dark themes.
- Validation of inputs.
- Save prediction results.
Model Persistence: Save and load trained models with joblib.

Installation

Clone the repository:

git clone https://github.com/iamrayyan1/PAI-Project.git

Navigate to the project directory:
```
cd PAI-Project
```
Install the required libraries:
```
pip install -r requirements.txt
```

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate   # For Linux/macOS
venv\Scripts\activate      # For Windows

Authentication System

Before using the prediction system, users must register and log in through a secure interface. This ensures that only authorized users can access the application.

Steps:

Register a new user:
- Enter your desired username and password.
- Passwords are stored securely using hashing.
Login:
- Enter your registered credentials to access the main application.

Dataset

Name: Pima Indian Diabetes Dataset
Description: This dataset contains medical diagnostic measurements for predicting the onset of diabetes based on specific factors. It has 768 entries with 8 features and a target variable (Outcome).

Machine Learning Workflow

Steps:

Data Preprocessing:
- Handle missing values using median imputation.
- Address outliers using the IQR method.
- Scale features using StandardScaler.
Class Imbalance Handling:
- Applied SMOTETomek for oversampling the minority class and undersampling the majority class.
Model Training:
- Models Used:
  - Random Forest Classifier
  - K-Nearest Neighbors (KNN)
  - Ensemble Voting Classifier
- Train-Test Split: 80%-20% with stratification.
Model Evaluation:
- Metrics:
  - Confusion Matrix
  - Accuracy
  - Classification Report
  - ROC-AUC Score

Tkinter GUI Application

The Tkinter-based GUI provides an intuitive interface for predictions.

Individual Predictions: Enter patient data manually.
Batch Predictions: Load and predict multiple records from a CSV file.
Themes: Switch between Light/Dark themes.
Help Section: Tooltips for input fields and detailed instructions.

GUI Screenshot:

)

Results

Model Performance:

Metric	Score
Accuracy	86%
ROC-AUC Score	92%

Usage

Command-Line Prediction:

Train and save the model:

https://github.com/iamrayyan1/PAI-Project/blob/main/diabetes_model.py

To view EDA/Graphs:

https://github.com/iamrayyan1/PAI-Project/blob/main/graphs.py

GUI Usage:

Run the GUI:

 https://github.com/iamrayyan1/PAI-Project/blob/main/final.py

---

## Contributing

We welcome contributions! Please follow these steps:

1. Fork the repository.
2. Create a new branch 
3. Commit your changes.
4. Push to the branch.
5. Open a Pull Request.

---

## License

This project is licensed under the Creative Commons Legal Code License. See the `LICENSE` file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
images		images
LICENSE		LICENSE
README.md		README.md
diabetes.csv		diabetes.csv
diabetes_model.pkl		diabetes_model.pkl
diabetes_model.py		diabetes_model.py
diabetes_prediction.log		diabetes_prediction.log
diabetes_predictions.csv		diabetes_predictions.csv
final.py		final.py
graphs.py		graphs.py
user_database.db		user_database.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAI-Project

Diabetes Prediction System using Pima Indian Dataset

Features

Table of Contents

Installation

Authentication System

Steps:

Dataset

Machine Learning Workflow

Steps:

Tkinter GUI Application

GUI Screenshot:

Results

Model Performance:

Usage

Command-Line Prediction:

GUI Usage:

About

Releases

Packages

Contributors 3

Languages

License

iamrayyan1/PAI-Project

Folders and files

Latest commit

History

Repository files navigation

PAI-Project

Diabetes Prediction System using Pima Indian Dataset

Features

Table of Contents

Installation

Authentication System

Steps:

Dataset

Machine Learning Workflow

Steps:

Tkinter GUI Application

GUI Screenshot:

Results

Model Performance:

Usage

Command-Line Prediction:

GUI Usage:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages