BSNLP-SlavNER-shared-task-2021-with-Spacy-v3

This repository consists of the code for proccessing the data and training NER for Russian language at BSNLP SlavNER 2021 using Spacy

Install dependencies

pip install -r requirements.txt

External resources / Prerequisites

Look at this article if you're experiencing troubles with gpu to fine-tune Bert
Add pre-trained vectors
Add pretraining corpus

Instructions

Clone this repository
Download the data from BSNLP Shared Task page and put it into data/bsnlp2021_train_r1
Run python save_data.py $folder1 $folder2 $folder3 split_train split_dev folder4 train_file dev_file test_file where folder1, folder2, folder3 are the names of the folders for training and development sets, split_train and split_dev are percentage of data used in training and dev sets, folder4 is the name of the folder for test_set, train_file, dev_file, test_file are the file names of the resulting spacy binary data sets. Now you have your training data for Spacy v.3
(optional) Run save_pretraining.py $folder_names where folder_names are the different folder names you want to use for the pretraining corpus. Check it out here how pretraining might help you to obtain better results.
Run python -m spacy train config_ner_ruVec_pretrain.cfg --output ./tok2vec_output to train the model. Specify paths to the training and development sets inside the config file before training. Also you may use a pretraining corpus and choose different pretrained vectors to potentially obtain better results. By default there are vectors from ru_core_news_lg Russian model. You may find more info on training command and config editting here
Run python -m spacy train config_spacy_trans.cfg --output ./multilingual_output to train the model. Specify paths to the training and development sets inside the config file before training. You can change hyperparameters there and choose different models from https://huggingface.co/models. By default the model is "bert-base-multilingual-uncased", which was used for the training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BSNLP-SlavNER-shared-task-2021-with-Spacy-v3

Install dependencies

External resources / Prerequisites

Instructions

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data/bsnlp2021_train_r1		data/bsnlp2021_train_r1
Lidl_Analytics_–_Erweiterung_des_Süßwarensortiments_(Eigenmarke).ipynb		Lidl_Analytics_–_Erweiterung_des_Süßwarensortiments_(Eigenmarke).ipynb
README.md		README.md
config_ner_ruVec_pretrain.cfg		config_ner_ruVec_pretrain.cfg
config_spacy_trans.cfg		config_spacy_trans.cfg
requirements.txt		requirements.txt
save_data.py		save_data.py
save_pretraining.py		save_pretraining.py
utils.py		utils.py

stewieboomhauer/BSNLP-SlavNER-shared-task-2021-with-Spacy-v3-main

Folders and files

Latest commit

History

Repository files navigation

BSNLP-SlavNER-shared-task-2021-with-Spacy-v3

Install dependencies

External resources / Prerequisites

Instructions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages