This project aims to understand the various outputs of natural language processing for an Arabic sentence with the pre-existing models available.
We use the models to augment a sentence and gauge the output using cosine similarity. We used 13 models from HuggingFace to do the data augmentation. This repo is the code for the web interface hosting all those above methods. The project is based on Python and uses the package streamlit
to host it on the web.
Data Augmentation Techniques / Machine Learning Models Used:
- AraBERT
- QARiB
- XLM-RoBERTa
- AraBART
- CAMeLBERT-Mix NER
- Arabic BERT (Large)
- ARBERT
- MARBERTv2
- AraELECTRA
- AraGPT2
- W2V (AraVec)
- Text-to-Text Augmentation
- Back Translation
-
Clone the Github repo to your local machine and follow the steps below.
# Install pipenv (to run environments) # If you already have pip installed... pip install pipenv # If you have Fedora 28 sudo dnf install pipenv # If you have Homebrew (MacOS) [This is discouraged] brew install pipenv
-
Start a new
pipenv
environmentpipenv shell
-
Install the required packages using
pip
pip install -r requirements.txt
-
You will need to download the W2V model from the script that is given in the
/scripts
folder.cd scripts ./aravec_download.sh
-
Once all the packages are installed, you should be able to run the app on Streamlit locally
# main.py is the file where the main code resides streamlit run main.py
-
Build the docker app
sudo docker build -t dataaug-webapp:latest .
-
Run the web app using docker with port 8051 and then forwarding it to port 8080 on the server
sudo docker run -p 8080:8501 dataaug-webapp
-
Run the web app in the background
sudo docker run -p 8080:8501 -d dataaug-webapp
sudo docker build -t dataaug-webapp:latest . && sudo docker run -p 8080:8501 -d dataaug-webapp
sudo docker rm $(sudo docker ps --filter status=exited -q)
The scripts include two commands now to make the process of building and running docker containers easier.
First, change the directory to /scripts
cd scripts
Then run either of the two commands based on what you want to do:
# To stop and remove a container (find the name of the container first using 'sudo docker ps' command) [default image name is 'dataaug-webapp']
./dockercommands.sh stop
# To build and run a new container (default image name is 'dataaug-webapp')
./dockercommands.sh build
If you want to make any changes to the repo, follow these steps:
- Fork it to your own Github profile
- Make the changes
- Create a pull request
- And, I will review the request and accept it.