UD_PerDT_DependencyModel

A UDPipe2 trained model on UD_PerDT (Universal Persian Dependency Treebank). This is a UDPipe2 model trained on UD_PerDT which is a Persian Universal Dependency Treebank. The corpus is the result of automatic conversion of Dadegan(Persian DT) to its universal version. it contains nearly 30,000 sentences labeled with lamatization, POS tagges and dependency relations.

Steps to run model

Setting up an environment for udpip

conda create --name udpip_env

or

 pip3 install virtualenv
 virtualenv udpip_env

pip install -r requirements.txt

Setting up an environment in webembedding_service folder
```
virtualenv venv
pip install -r requirements.txt
```
Run script for producing BERT embeddings:
```
bash scripts/compute_embeddings.sh test_data
```
or
sequentially run compute_embeddings file for each of your conllu file:
```
python3 wembedding_service/compute_wembeddings.py --format conllu path/to/input_file/test_data.conllu path/to/output_file/test_data.conllu.npz
```
Noted that npz output file should be in the same folder as input conllu.
If you don't have access to "bert-base-multilingual-uncased" model in code, you can download its key files (config.json, tf_model.h5, tokenizer.json, tokenizer_config.json, vocab.txt) from here and put them in folder on your local system, then run the previous script with these new parameters:
```
python3 wembedding_service/compute_wembeddings.py --model custom_model --model_path path/to/local_folder path/to/input_file/test_data.conllu path/to/output_file/test_data.conllu.npz
```
Download the trained model from here

Run the model in prediction mode

python3 udpipe2.py uni_PerDT_model --predict --predict_input path/to/input --predict_output path/to/output

Input Format

the input file should be prepared in conllu format. you can fill format just with tokens and leave the other tags as blank (_) so trained model will fill them for you. (refer to the sample file in test_data/test/test_data.conllu)

If your file is in raw text format (.txt), first of all install hazm library and then you can use the following script to convert it to conllu:

python3 convert_rawTxt_to_conllu.py --input_file path/to/input_txt_file --output_file path/to/save/output_conllu_file

the result of model on test set of UD_PerDT corpus:

Metric	Precision	Recall	F1 Score	AligndAcc
Tokens	100.00	100.00	100.00
Sentences	100.00	100.00	100.00
Words	100.00	100.00	100.00
UPOS	97.55	97.55	97.55	97.55
XPOS	97.30	97.30	97.30	97.30
UFeats	97.61	97.61	97.61	97.61
AllTags	95.28	95.28	95.28	95.28
Lemmas	98.98	98.98	98.98	98.98
UAS	93.62	93.62	93.62	93.62
LAS	90.96	90.96	90.96	90.96
CLAS	88.97	88.73	88.85	88.73
MLAS	85.00	84.77	84.89	84.77
BLEX	87.75	87.51	87.63	87.51

Reference

Safari, Pegah, Mohammad Sadegh Rasooli, Amirsaeid Moloodi, and Alireza Nourian. "The Persian dependency treebank made universal." In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 7078-7087. 2022.

Resources

Contact Info:

pegh.safari@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
docs		docs
scripts		scripts
test_data/test		test_data/test
ud-2.6		ud-2.6
uni_PerDT_model_output		uni_PerDT_model_output
wembedding_service		wembedding_service
.gitignore		.gitignore
README.md		README.md
convert_rawTxt_to_conllu.py		convert_rawTxt_to_conllu.py
requirements.txt		requirements.txt
udpipe2.py		udpipe2.py
udpipe2_dataset.py		udpipe2_dataset.py
udpipe2_eval.py		udpipe2_eval.py
udpipe2_server.py		udpipe2_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UD_PerDT_DependencyModel

Steps to run model

Input Format

Reference

Resources

Contact Info:

About

Releases

Packages

Contributors 2

Languages

phsfr/UD_PerDT_DependencyModel

Folders and files

Latest commit

History

Repository files navigation

UD_PerDT_DependencyModel

Steps to run model

Input Format

Reference

Resources

Contact Info:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages