SRBedding

Project setup

Before runnning the project setup the environment
poetry shell
poetry update

Evaluation jupyter notebook

Inside or evaluation-pipetine add datasets folder and results.
For loading SQuAD-sr you need to add the squad-sr-lat.json into the datasets folder.
First run the make-evaluation-datasets.ipynb. This will create all the files needed
Then run
cd evaluation-pipetine/
python evaluation-pipieline.py

Training dataset creation

Run the following commands for creating the training dataset:

cd training_dataset
python .\main_training.py
python .\batch_loading.py
The .parquet files will be saved in the datasets folder.

Translating dataset

The folder translation_pipeline is used for translating ms_marco and natural_questions from English to Serbian. Translated queries and contexts from this datasets will be used for evaluation. Run the following commands:

cd translation_pipeline
python .\sending_batch.py
python .\processing_batch.py

The folder translation_sts is used for translating one sentence pair from the sts dataset for the distiladion evaluator. Run the following commands:

cd translation_sts
python .\sending_batch.py
python .\processing_batch.py

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
evaluation_pipeline		evaluation_pipeline
training		training
training_dataset		training_dataset
translation_pipeline		translation_pipeline
translation_sts		translation_sts
.gitignore		.gitignore
README.md		README.md
SRBedding_Technical_Report.pdf		SRBedding_Technical_Report.pdf
api_request_parallel_processor.py		api_request_parallel_processor.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
utils.py		utils.py
utils_openAI.py		utils_openAI.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRBedding

Project setup

Evaluation jupyter notebook

Training dataset creation

Translating dataset

About

Releases

Packages

Contributors 4

Languages

smartcat-labs/SRBedding

Folders and files

Latest commit

History

Repository files navigation

SRBedding

Project setup

Evaluation jupyter notebook

Training dataset creation

Translating dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages