Skip to content

This is a Repo where i will be using machine learning and data analysis for a better understand in the Students anxiety and depression dataset

Notifications You must be signed in to change notification settings

nicolasvargaszz/Kaggle_depresion_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

🧠 Mental Health Analysis: Students Anxiety and Depression Dataset.

Scikit-learn TensorFlow

Welcome to my portfolio repository, where I apply machine learning techniques to analyze mental health datasets, specifically focusing on student anxiety and depression. My aim is to demonstrate how data-driven approaches can provide valuable insights into mental health issues.

Overview

In this project, I explore various machine learning models using the Students Anxiety and Depression dataset from Kaggle.

  • Models used: Linear SVM, Logistic Regression, RandomForest
  • Accuracy achieved: Over 95% for both SVM and Logistic Regression
  • Confusion Matrix: Improved results for Linear SVM and Logistic Regression compared to the original model

🔗 Original Kaggle Project

🔍 Key Contributions:

  1. Added Linear SVM and Logistic Regression models.
  2. Enhanced the confusion matrix results.
  3. Applied additional data analysis techniques for better data understanding.

🔧 Tools and Libraries:

  • Python, Scikit-learn, Matplotlib, Seaborn

Confusion Matrix

Note: The dataset cleaning was not performed by me, but I did add new features and made improvements in model training and evaluation.


🔬 Kaggle: Suicide and Depression Detection using TensorFlow

This project uses TensorFlow and Scikit-learn to detect suicidal and depressive comments from three subreddits.


📊 Data Visualization:

To understand the dataset, I performed the following visualizations:

  • Distribution of labels across subreddits.
  • Word clouds to show the most frequent words associated with each class.
  • Correlation heatmaps to detect relationships between variables.

Add graphs and charts here with the code used to generate them.


🧹 Data Cleaning:

The dataset was preprocessed using the following steps:

  1. Tokenization and removal of stopwords.
  2. Normalization by converting all text to lowercase.
  3. Lemmatization to reduce words to their base form.
  4. Removal of outliers based on word count distributions.

🛠️ Model Construction:

Several deep learning architectures were tested, including:

  • LSTM (Long Short-Term Memory) Networks: Suitable for sequential text data.
  • Convolutional Neural Networks (CNNs): For text classification.
  • BERT-based Model: State-of-the-art performance for NLP tasks.

🏋️ Model Training:

For each model, I used the following setup:

  • Optimizers: Adam with learning rate of 0.001
  • Loss function: Binary Cross-Entropy
  • Batch size: 64
  • Epochs: 10

📈 Model Evaluation:

The performance of each model was evaluated using:

  • Accuracy, Precision, Recall, F1-Score
  • Confusion Matrix for visualizing misclassifications.

Here's a summary of the results:

Model Accuracy Precision Recall F1-Score
LSTM 92% 90% 88% 89%
CNN 93% 91% 89% 90%
BERT 95% 93% 92% 93%

Add a confusion matrix image here or another evaluation graph.


Feel free to explore the code and make contributions!

📝 Related Projects:


Contact Information:


About

This is a Repo where i will be using machine learning and data analysis for a better understand in the Students anxiety and depression dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published