This repository contains experiments and evaluations for a mental health classification model, aiming to classify user text into specific mental health categories.
- Introduction
- Dataset Details
- Model Architecture
- Requirements
- Usage
- Results
- Future Work
- Acknowledgments
The goal of this project is to create a robust NLP model capable of classifying text into specific mental health categories, such as depression, anxiety, and suicidal thoughts. The focus is on leveraging pre-trained BERT models and improving performance through techniques like focal loss and dropout regularization.
- Source: kaggle dataset
- Preprocessing:
- Text tokenization using BERT's tokenizer.
- Standardization: Lowercasing.
- Imbalance Handling: Addressed class imbalance using weighted loss functions.
- BERT Base (L-12, H-768, A-12)
- Pretrained weights: bert-en-uncased-l-12-h-768-a-12
- Clone the repository:
git clone https://github.com/HealLink/ML-Model.git cd ML-Model
- Create and activate a Conda environment
conda create -n model-env python=3.11.10 -y conda activate model-env
- Install dependencies
pip install -r final_requirements.txt
- Run notebook_final.ipynb inside the notebooks subdirectory
- The current best model achieved:
- Epoch: 2
- Train loss: 0.12276183813810349
- Val loss: 0.082574762403965
- Train MCC: 0.7533358335494995
- Val MCC: 0.7541943192481995
- Train Accuracy: 0.8077240586280823
- Val Accuracy: 0.8009008765220642
- Confusion Matrix (Test set):
- Thanks to Allah Subhanahu Wa Ta'ala for all his grace and favor, so that this project can be completed properly.
- Thanks to TensorFlower for creating TensorFlow Framework.
- Thanks to Google for creating BERT as base model.
- Thanks to Mr. Andrew Ng and Mr. Laurence in Coursera for teaching ML.
- Thanks to Bangkit Teams for this learning opportunity.
- Thanks to Suchintika Sarkar for compiling and cleaning the dataset.