Before runnning the project setup the environment
poetry shell
poetry update
Inside or evaluation-pipetine add datasets folder and results.
For loading SQuAD-sr you need to add the squad-sr-lat.json into the datasets folder.
First run the make-evaluation-datasets.ipynb. This will create all the files needed
Then run
cd evaluation-pipetine/
python evaluation-pipieline.py
Run the following commands for creating the training dataset:
cd training_dataset
python .\main_training.py
python .\batch_loading.py
The .parquet files will be saved in the datasets folder.
The folder translation_pipeline is used for translating ms_marco and natural_questions from English to Serbian. Translated queries and contexts from this datasets will be used for evaluation. Run the following commands:
cd translation_pipeline
python .\sending_batch.py
python .\processing_batch.py
The folder translation_sts is used for translating one sentence pair from the sts dataset for the distiladion evaluator. Run the following commands:
cd translation_sts
python .\sending_batch.py
python .\processing_batch.py