GitHub - IbtihelSidhom/Polarity.tn: A web platform that detects the language and sentiment polarity of a given text written in arabic characters using ML :bar_chart:!

Polarity.tn is a web platform that detects the language and sentiment polarity of a text written in arabic characters. It differenciates arabic and tunisian dialect texts using machine learning techniques.

💬 Language identification

These steps give an overview on the language identification pipeline of our script:

Text Cleaning
Construct a language classifier using supervised learning by training our arab and tunisian corpuses
Converting the documents to feature vectors using the BOW-tfidf method with character n-grams : max_df= 0.85, min_df=0.25, ngram_range = (1,4)
Training and testing our MultinomialNB model ( Parameters of BOW were chosen according to accuracy value & confusion matrix results (F1) after multiple tests)

📈 Sentiment analysis

These steps give an overview on the sentiment analysis pipeline of our script:

Text Cleaning
Normalization & tokenization
Remove stop words
Stemming
Document representation using BOW
Learning Clasiffication model: we tested Naive Bayes Classifier, SVM and LP to finally choose the NB classifier because it gave us the best accuracy and confusion matrix compared to LR and SVM .
Construct the final model using the entire corpus.

Realized by Ibtihel Sidhom, Molka Zaouali and Taysir Ben Hamed in December 2018 💻

⚙️ Configuration

Set up your Python environment

Run this command under the root directory of this repository:

$ pipenv install

To create a virtual environment you just execute the $ pipenv shell command.

📖 User Manual

Language Identification script

To run the language identification script on the existing corpus files, you can execute this command:

$ python Generating-models/language-identification.py

You can also test it locally by uncommenting the last lines of the script and typing your input text in the script. Comment the dumping part to make the script run faster.

Sentiment Analysis script

To run the sentiment analysis script on the existing corpus files, you can execute this command:

$ python Generating-models/sentiment-analysis.py

You can also test it locally by uncommenting the last lines of the script and typing your input text in the script. Comment the dumping part to make the script run faster.

Web application

To start the web application, you can execute this command:

$ python Web-application/app.py

Entering a text message...

or Uploading a file !

Reviewing the prediction results ✨

In order to enlarge our data, when you get the results of a text message, you are asked for feedback on the predicted results by answering the given small form.

Based on this evaluation, this data will be stored in a file to be added to the corpus in the future.

The amazing background is by the awesome street artist El Seed.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Generating-models		Generating-models
Test files		Test files
Web-application		Web-application
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💬 Language identification

📈 Sentiment analysis

⚙️ Configuration

Set up your Python environment

📖 User Manual

Language Identification script

Sentiment Analysis script

Web application

Entering a text message...

or Uploading a file !

Reviewing the prediction results ✨

About

Releases

Packages

Contributors 2

Languages

IbtihelSidhom/Polarity.tn

Folders and files

Latest commit

History

Repository files navigation

💬 Language identification

📈 Sentiment analysis

⚙️ Configuration

Set up your Python environment

📖 User Manual

Language Identification script

Sentiment Analysis script

Web application

Entering a text message...

or Uploading a file !

Reviewing the prediction results ✨

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages