This project aims to predict diseases based on symptoms using different machine learning algorithms, namely Decision Tree, Random Forest, K Nearest Neighbours, and Naïve Bayes Algorithm. The dataset used for this project is an open-source one hosted in Kaggle Data Repository, containing information on 4962 patients and 42 different diseases.
The dataset used in this project contains information on 4962 patients and 42 different diseases, including Fungal infection, Allergy, GERD, Chronic cholestasis, Drug Reaction, Peptic ulcer disease, AIDS, Diabetes, Gastroenteritis, Bronchial Asthma, Hypertension, Migraine, Cervical spondylosis, Paralysis (brain hemorrhage), Jaundice, Malaria, Chicken pox, Dengue, Typhoid, hepatitis A, Hepatitis B, Hepatitis C, Hepatitis D, Hepatitis E, Alcoholic hepatitis, Tuberculosis, Common Cold, Pneumonia, Dimorphic hemmorhoids(piles), Heart attack, Varicose veins, Hypothyroidism, Hyperthyroidism, Hypoglycemia, Osteoarthristis, Arthritis, (vertigo) Paroymsal Positional Vertigo, Acne, Urinary tract infection, Psoriasis, and Impetigo.
The following machine learning algorithms are used to predict diseases based on symptoms:
- Decision Tree
- Random Forest
- K Nearest Neighbours
- Naïve Bayes Algorithm
The flow of the project is as follows:
- Data Collection: The dataset is collected from Kaggle Data Repository.
- Data Preprocessing: The dataset is cleaned, and missing values are handled.
- Data Visualization: Data is visualized to get insights and better understanding.
- Feature Selection: Important features are selected for training the models.
- Model Building: Four different machine learning algorithms are used to train the model.
- Model Evaluation: The performance of the model is evaluated using different metrics.
- Interactive Interface: An interactive interface is developed to facilitate interaction with the data.
The performance of the model is evaluated using different metrics such as accuracy, precision, recall, and F1-score. The results show that the Random Forest algorithm outperforms the other algorithms with an accuracy of 98%.
The use of machine learning algorithms for disease prediction based on symptoms is quite promising and can be a cost-effective and efficient way to diagnose diseases. The results of this project demonstrate the effectiveness of machine learning algorithms in predicting diseases based on symptoms. An interactive interface can also help healthcare professionals to make better decisions and provide accurate diagnoses.