Training with BERT Transformers #12115

alvaromarlo · 2023-01-17T18:34:29Z

alvaromarlo
Jan 17, 2023

Hi,

I am developing a spaCy model and I want to use BERT transformers. Starting from the bottom, I have an annotations file from Prodigy that we are converting to spaCy data with the data-to-spacy command.

Once we have the train and dev data, we are following this quick tutorial (https://www.youtube.com/watch?v=Y_N_AO39rRg&t=1s) in order to train the model using the Google Colab's GPU. Also, we are using a custom tokenizer.

The config file we are using is this one:

[paths]
train = "{train_path}"
dev = "{dev_path}"
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "es"
pipeline = ["transformer","ner"]
batch_size = 16
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "dccuchile/bert-base-spanish-wwm-cased"
mixed_precision = false

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.grad_scaler_config]

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
after_init = null

[initialize.components]

[initialize.tokenizer]

[initialize.before_init]
@callbacks = "customize_tokenizer"

The problem appears when we do the train command

!python -m spacy train assets/corpus/config.cfg --code app/customize_tokenizer.py --output models/boe-b-section-4 --gpu-id 0

We receive these message at the first iteration:

Aborting and saving the final best model. Encountered exception: OutOfMemoryError('CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 14.76 GiB total capacity; 12.43 GiB already allocated; 43.75 MiB free; 13.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF')

We tried splitting the data in sentences but the problem persists at the first iteration.

There's something we are doing wrong? Is the config file correct? Can we print extra information at the training to understand what's happening?

Massive thanks

adrianeboyd · 2023-01-18T09:08:57Z

adrianeboyd
Jan 18, 2023

Nothing in the config jumps out at me as an error, and I see that you've lowered the default batch size for the eval step. If you have long training texts, they may still be too long for the training batch size in [training.batcher]. How long are your training texts?

Does the OOM error appear before the first epoch 0 eval line appears in the output or after? If it's before, it's crash on the eval step; if it's after, it's crashing during the train/update step, since they have different settings and batch sizes that affect the memory usage.

The transformer takes up a good chunk of GPU memory, so one test that can be helpful to check that the rest of your config is okay is to test that it runs with a smaller model like distilbert-base-uncased. Then you can at least be confident that config is fine and the only problem is the text and batch sizes.

One way we manage this for training models like en_core_web_trf is with the [corpora.train.max_length] setting. If max_length > 0, then longer docs are split into docs of individual sentences for training by the Corpus when it loads the data from the .spacy file (https://spacy.io/api/top-level#corpus). This does require you to have sentence annotation in the .spacy file, which you might not have automatically in your prodigy export. If you have already tried training from sentence-length docs, this may not help.

0 replies

RaulKite · 2023-04-23T04:51:23Z

RaulKite
Apr 23, 2023

@alvaromarlo Did you success with the creation of a new bert model including ner?

I'm thinking about research how to create a new Spanish model including ner for Spanish and some guidance would be nice 😊

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with BERT Transformers #12115

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Training with BERT Transformers #12115

alvaromarlo Jan 17, 2023

Replies: 2 comments

adrianeboyd Jan 18, 2023

RaulKite Apr 23, 2023

alvaromarlo
Jan 17, 2023

adrianeboyd
Jan 18, 2023

RaulKite
Apr 23, 2023