Skip to content

cleaning arabic text data and using classic machine learning classification methods

Notifications You must be signed in to change notification settings

shimaa83/Arabic_text_classification

Repository files navigation

data set

https://www.kaggle.com/code/abedkhooli/ar-reviews-100k/data

Arabic text classification

cleanig arabic data set using:

  • remove Punctuation using re library in python by making special function yo do that
  • remove stop words by using two different way:
    - using !pip install arabic-stopwords
    - using external file of stop words (https://github.com/shimaa83/arabic-stop-words)\
  • remove diacritization using camel library (pip install camel-tools)
  • making normalization using camel library
    classification using classic machine learning
  • first convert words TfidfTransformer and then apply some algorithms

About

cleaning arabic text data and using classic machine learning classification methods

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published