Skip to content

A web platform that detects the language and sentiment polarity of a given text written in arabic characters using ML 📊!

Notifications You must be signed in to change notification settings

IbtihelSidhom/Polarity.tn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Polarity.tn is a web platform that detects the language and sentiment polarity of a text written in arabic characters. It differenciates arabic and tunisian dialect texts using machine learning techniques.

💬 Language identification

These steps give an overview on the language identification pipeline of our script:

  1. Text Cleaning
  2. Construct a language classifier using supervised learning by training our arab and tunisian corpuses
  3. Converting the documents to feature vectors using the BOW-tfidf method with character n-grams : max_df= 0.85, min_df=0.25, ngram_range = (1,4)
  4. Training and testing our MultinomialNB model ( Parameters of BOW were chosen according to accuracy value & confusion matrix results (F1) after multiple tests)

📈 Sentiment analysis

These steps give an overview on the sentiment analysis pipeline of our script:

  1. Text Cleaning
  2. Normalization & tokenization
  3. Remove stop words
  4. Stemming
  5. Document representation using BOW
  6. Learning Clasiffication model: we tested Naive Bayes Classifier, SVM and LP to finally choose the NB classifier because it gave us the best accuracy and confusion matrix compared to LR and SVM .
  7. Construct the final model using the entire corpus.

Realized by Ibtihel Sidhom, Molka Zaouali and Taysir Ben Hamed in December 2018 💻



⚙️ Configuration

Set up your Python environment

Run this command under the root directory of this repository:

$ pipenv install

To create a virtual environment you just execute the $ pipenv shell command.


📖 User Manual

Language Identification script

To run the language identification script on the existing corpus files, you can execute this command:

$ python Generating-models/language-identification.py 

You can also test it locally by uncommenting the last lines of the script and typing your input text in the script. Comment the dumping part to make the script run faster.

Sentiment Analysis script

To run the sentiment analysis script on the existing corpus files, you can execute this command:

$ python Generating-models/sentiment-analysis.py 

You can also test it locally by uncommenting the last lines of the script and typing your input text in the script. Comment the dumping part to make the script run faster.

Web application

To start the web application, you can execute this command:

$ python Web-application/app.py

Entering a text message...

or Uploading a file !

Reviewing the prediction results ✨

In order to enlarge our data, when you get the results of a text message, you are asked for feedback on the predicted results by answering the given small form.

Based on this evaluation, this data will be stored in a file to be added to the corpus in the future.


The amazing background is by the awesome street artist El Seed.

About

A web platform that detects the language and sentiment polarity of a given text written in arabic characters using ML 📊!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published