Arabic Text Diacritization Project

Overview

This project aims to develop a system for Arabic text diacritization using natural language processing (NLP) techniques. Diacritization involves adding diacritical marks (e.g., vowels, short vowels, etc.) to Arabic text, which aids in pronunciation and comprehension, particularly for learners or in automated text processing tasks.

Features

Diacritize Arabic text input.
Support for various diacritical marks commonly used in Arabic.
Evaluate diacritization accuracy through metrics such as accuracy, precision, and recall.
Trainable model for improving diacritization performance.

Installation

Clone the repository:

git clone https://github.com/khaHesham/arabic-diacritization.git

Install dependencies:

cd arabic-diacritization
pip install -r requirements.txt

Usage

Prepare your Arabic text data.
Run the diacritization script:
```
python diacritize.py --input input.txt --output output.txt
```
Replace input.txt with the path to your input file and output.txt with the desired output file path.
Evaluate diacritization accuracy:
```
python evaluate.py --predicted predicted.txt --gold gold.txt
```
Replace predicted.txt with the path to the predicted diacritized text file and gold.txt with the path to the gold standard diacritized text file.

Training

If you wish to train your own diacritization model:

Prepare a training dataset with diacritized Arabic text.
Train the model:
```
python train.py --train_data train.txt --dev_data dev.txt --model_dir model/
```
Replace train.txt with the path to your training data, dev.txt with the path to your development data, and model/ with the desired directory for saving the trained model.

Contributors

Abdelaziz Salah
Abdelrahman Noaman
Khaled Hesham
Kirollos Samy

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any inquiries or feedback, please contact AEyeTeam.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Arabic Text Diacritization Project

Overview

Features

Installation

Usage

Training

Contributors

License

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Arabic Text Diacritization Project

Overview

Features

Installation

Usage

Training

Contributors

License

Contact