-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce inference time by 30% when using Flair embeddings #1074
Conversation
@alanakbik for
Does |
@pommedeterresautee sorry for the delay - I'm planning to look into #1078 and review this PR today! |
Hello @pommedeterresautee thanks for the PR! Prediction speed is well increased when setting storage mode to 'cpu'. However, when setting storage mode to 'gpu', I now get OOM errors. For instance: from flair.datasets import WNUT_17
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
from flair.models import SequenceTagger
corpus = WNUT_17(in_memory=True)
# 2. what tag do we want to predict?
tag_type = 'ner'
# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
# embeddings
embeddings = StackedEmbeddings([
FlairEmbeddings('news-forward'),
FlairEmbeddings('news-backward'),
WordEmbeddings('glove'),
])
tagger: SequenceTagger = SequenceTagger(hidden_size=256,
embeddings=embeddings,
tag_dictionary=tag_dictionary,
tag_type=tag_type,
)
# 6. initialize trainer
from flair.trainers import ModelTrainer
trainer: ModelTrainer = ModelTrainer(tagger, corpus)
# # 7. start training
trainer.train(
f'resources/taggers/local-test',
max_epochs=5,
train_with_dev=True,
embeddings_storage_mode='gpu',
) This previously trained well with GPU storage mode, but now is throwing memory errors. It may be that without the |
@pommedeterresautee quick update: I think the problem lies in the line https://github.com/zalandoresearch/flair/blob/e0437d577be28cf7d099df588ad3a997e542df60/flair/embeddings.py#L1856 where we select from all hidden states of all characters in the sentence only the states at one position to become the word embedding. Essentially, what we need to do here is copy these states over to a new tensor so that we can discard all the rest of the huge tensor With your PR, we don't do the But if the storage mode is 'gpu', the |
Came to the same conclusion than you. I agree that this has no effect on none, but it increases memory consumption on CPU. Things not working: narrow (to keep the same under the hood data lawyer) and split (to split into list of tensor) Complicated... |
One potential solution would be to define a global variable. We currently only have two: the We could add an Drawback is that this adds another global variable but if it is only for non-essential optimization instructions maybe it would be ok in this case. What do you think? I could add this in a separate PR. |
tks @alanakbik. |
I think this is a beginning of understanding, the view theory may be the right one, they share Calling Little exp: embedding = all_hidden_states_in_lm[offset, i, :]
print(embedding.is_set_to(all_hidden_states_in_lm))
print(embedding.data_ptr(), all_hidden_states_in_lm.data_ptr())
print(embedding.storage().data_ptr(), all_hidden_states_in_lm.storage().data_ptr())
print(embedding.storage()._weak_ref(), all_hidden_states_in_lm.storage()._weak_ref())
print(embedding.storage_offset(), all_hidden_states_in_lm.storage_offset()) It print
_weak_ref are the same, .storage().data_ptr() are the same but data_ptr are not. |
Hi @pommedeterresautee yes that sounds good - I'll merge shortly! Thanks for sharing your findings! I guess because they share the storage, the |
👍 |
1 similar comment
👍 |
It appeared on profiler (screenshots on #1070) that
Tensor.clone()
was one of the slowest single operations during inference.Looking at the code, it's always called even when gradient is disabled.
My understanding is that when gradient is disabled, embeddings can't change and so there is no need to clone the Tensor to avoid it to be updated. That's what does this commit.
I have made tests on inference, but not on LM training, I don't think it changes anything but let me know if I misses something.
Time on 2080 TI with storage set to
None
on my French dataset.FYI, before I was doing measures on storage set to
"cpu"
, that's why before time decreased a lot compared to PR #1068 (43 secs after optimization). Details on #1070Add a warning when cpu option is used with a GPU, plus update the documentation.
fix #1070
All evaluation where still having embedding storage set to cpu, I moved them to none