Skip to content

This Project provides an in-depth analysis of the sentiment analysis that leverages BERT embeddings.

Notifications You must be signed in to change notification settings

iambitttu/Sentiment-Analysis-using-BERT-Embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Sentiment-Analysis-using-BERT-Embeddings

Project Overview

The project aimed to automate the categorization of customer reviews into three sentiment categories: positive, negative, and neutral. Below, we provide a step-by-step account of the entire project, including data collection, preprocessing, model selection, fine-tuning, evaluation, and deployment.

Step 1: Data Collection

Description: In this initial phase, we collected a comprehensive dataset containing student reviews.

Source: The dataset on sentiment analysis in the ed-tech domain was obtained from a publicly available dataset. It consisted of text reviews and corresponding sentiment labels.

Size: The dataset comprises thousands of student reviews.

Step 2: Data Preprocessing

Description: The collected data underwent rigorous preprocessing to ensure compatibility with the BERT model and to prepare it for further analysis.

  • Tokenization: We tokenized the raw text using BERT's tokenizer, which breaks text into individual tokens or subwords.

  • Encoding: Text sequences were encoded into a numerical format using BERT's vocabulary, which assigns a unique ID to each token. We also applied padding to ensure all sequences have the same length.

  • Data Split: The dataset was divided into a training set (80%) and a testing set (20%) to evaluate model performance.

Step 3: Model Selection

Description: To effectively classify sentiment in customer reviews, we opted for the powerful BERT model due to its contextual understanding of language.

Model: We used the 'bert-base-uncased' variant of the BERT model, which is a widely adopted pre-trained model for various NLP tasks.

Step 4: Model Fine-Tuning

Description: The BERT model was fine-tuned for sentiment classification on our specific dataset.

  • Classification Layer: We added a custom classification layer on top of the BERT model. This layer has three output neurons, corresponding to the three sentiment classes (positive, negative, and neutral).

  • Loss Function: For optimization, we employed the Cross-Entropy Loss function.

  • Training: The model was trained for 5 epochs with a batch size of 32 and a learning rate of 2e-5.

Step 5: Model Evaluation

Description: After fine-tuning, we assessed the model's performance using a battery of evaluation metrics.

Metrics:

  • Accuracy: The model's overall accuracy on the test set.
  • Precision, Recall, and F1-score: These metrics were computed for each sentiment class (positive, negative, neutral).
  • Confusion Matrix: To visualize the model's classification performance.

Step 6: Deployment

Description: The trained sentiment analysis model was deployed for real-time sentiment classification.

  • API Integration: We integrated the API into an existing customer feedback portal, enabling automatic sentiment analysis of incoming customer reviews.

Step 7: Conclusion

Summary: The sentiment analysis project successfully utilized BERT embeddings to automate the categorization of customer reviews into positive, negative, and neutral sentiments. This model demonstrates its effectiveness in streamlining sentiment analysis processes and improving customer feedback analysis.

Future Work: Future enhancements may include exploring domain-specific BERT variants, handling multilingual reviews, and continuously updating the model to adapt to evolving language patterns.

References

About

This Project provides an in-depth analysis of the sentiment analysis that leverages BERT embeddings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published