Glioma Classification with Machine Learning

This repository is part of a group project aimed at improving the classification of gliomas (brain tumors) into two categories: Lower-Grade Glioma (LGG) and Glioblastoma Multiforme (GBM). The project leverages clinical data and genetic mutations to build machine learning models that can help reduce the time and cost associated with traditional diagnostic methods.

The dataset used is sourced from the Glioma Grading Clinical and Mutation Features Dataset, which includes 20 gene mutation features and additional clinical parameters.

Goals

Primary Objective: Develop a machine learning pipeline that achieves high accuracy (target F1 Score: ≥ 0.90) in predicting tumor types.
Secondary Objective: Demonstrate insights through feature importance and visualization techniques.

Highlights

Data Preparation: Extensive cleaning, encoding, and stratified train-test splitting to address class imbalances.
Algorithm Development: Testing various models, including Decision Trees, Random Forest, Support Vector Machines, and ensemble methods, with optimized hyperparameters.
Innovation: Implementing an ensemble model combining Logistic Regression, Random Forest, and SVM, achieving significant performance improvements.
Documentation: Detailed analysis and methodologies documented to ensure reproducibility.

Repository Structure

/notebook: Contains Jupyter Notebooks detailing data cleaning, model development, and results.

/data: Original and preprocessed datasets.

Results

Random Forest achieved the best F1 Score of 0.90, meeting the project’s target.
Ensemble voting methods demonstrated robust performance across varying metrics.

Usage

Clone this repository: git clone git@github.com:whellcome/MLGliomaClassification.git
Install dependencies: pip install -r requirements.txt
Explore the notebooks and scripts to understand the data pipeline and modeling process.

About the Author

As part of the team, I focused on:

Data stratification and cleaning to address imbalanced features like race and gender.
Testing and optimizing models like Decision Trees and Random Forests.
Developing an ensemble approach combining Logistic Regression, Random Forest, and Support Vector Machines for improved performance.

For further information or collaboration opportunities, feel free to contact me or support the repository using the Buy Me a Coffee link!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
notebook		notebook
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Glioma Classification with Machine Learning

Goals

Highlights

Repository Structure

Results

Usage

About the Author

About

Releases

Packages

Languages

License

whellcome/MLGliomaClassification

Folders and files

Latest commit

History

Repository files navigation

Glioma Classification with Machine Learning

Goals

Highlights

Repository Structure

Results

Usage

About the Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages