A machine learning web application that predicts the likelihood of Parkinson's Disease based on biomedical voice measurements.
This project implements a Random Forest Classifier to predict Parkinson's Disease using various voice measurement features. The model is deployed as a web application using Streamlit, making it accessible for medical professionals and researchers.
- Trained on the UC Irvine ML Repository Parkinson's Dataset
- Uses Random Forest Classifier with optimized hyperparameters
- Interactive web interface for real-time predictions
- Visualization of feature importance and predictions
- Cross-validated model with detailed performance metrics
- Model: Random Forest Classifier with GridSearchCV optimization
- Web Framework: Streamlit
- Data Processing: Pandas, NumPy
- Visualization: Plotly Express
- Model Persistence: Pickle, Joblib
The model is trained using GridSearchCV with the following hyperparameters:
- Number of estimators: [100, 200]
- Max depth: [10, 15, 20]
- Min samples split: [2, 5]
- Min samples leaf: [1, 2]
- Max features: ['sqrt', 'log2']
Performance metrics on the test set are available in the training output.
- Python 3.11 or higher
- pip package manager
- Clone the repository:
git clone https://github.com/yourusername/parkinson-disease-pred.git
cd parkinson-disease-pred
- Install required packages:
pip install -r requirements.txt
- Train the model (optional, pre-trained model included):
python train_model.py
- Run the web application:
streamlit run app.py
The dataset used for training includes various biomedical voice measurements from subjects with and without Parkinson's Disease. Key features include:
- MDVP measurements (Fo, Fhi, Flo, etc.)
- Jitter and Shimmer measurements
- NHR and HNR ratios
- Status (health status of the subject)
- Launch the web application
- Input the required voice measurements
- Click "Predict" to get the model's prediction
- View the visualization of feature importance and prediction results
The train_model.py
script performs the following steps:
- Loads and preprocesses the dataset
- Splits data into training and test sets
- Performs GridSearchCV for hyperparameter optimization
- Trains the Random Forest model
- Evaluates model performance
- Saves the trained model
The model identifies the most significant voice measurements for prediction. The top features and their importance scores are displayed in the training output.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- UCI Machine Learning Repository for the Parkinson's Disease dataset
- Streamlit team for the excellent web framework
- scikit-learn developers for the machine learning tools
For questions and feedback, please open an issue in the GitHub repository.
This tool is for research and educational purposes only. It should not be used as a substitute for professional medical diagnosis. Always consult with healthcare professionals for medical advice and diagnosis.