You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regarding storing of embedding, my understanding is the following:
gpu: dynamic embeddings (Flair LM) are deleted after each batch, static embeddings are kept on GPU
cpu: dynamic embeddings (Flair LM) are deleted after each batch, static embeddings are moved to RAM, and if required later, moved back on CPU
none: dynamic and static embeddings are deleted after each batch
Expected time during inference:
none > cpu > gpu
However, making some measures, I saw:
cpu (43s) > none (35s) > gpu (34s)
So I apply to the very same code, a profiler, and got the following graphs:
None
CPU
GPU
Attention: reported times are higher than real times because of the profiler.
What appears is that the move back Tensor to CPU is costly (on my configuration, for my dataset), that's why, for me, during inference, none is a better option.
I expect that my situation is not isolated, and most users of Flair will find the same results.
So I am wondering, if a warning should be raised if CPU option is used during inference on GPU.
My questions are:
can you check if you find the same pattern on your computers?
do you want me to push a PR on that? (plus short explanation in the documentation, python and tuto)
The text was updated successfully, but these errors were encountered:
pommedeterresautee
changed the title
CPU storing is slower than None storing
CPU storing option is slower than None storing during inference on GPU
Sep 5, 2019
Hi @pommedeterresautee thanks for sharing this analysis. What profiler are you using?
Yes, moving tensor to/from GPU is costly, which is why by default we set 'none' as storage mode for the predict() method. The only reason to change this to something else would be if we want to not only use the predictions but also the embeddings after prediction.
A PR adding a warning would be great - we should probably also point this out in the docs that during inference you almost always want to use storage mode 'none'.
Regarding storing of embedding, my understanding is the following:
Expected time during inference:
none > cpu > gpu
However, making some measures, I saw:
cpu (43s) > none (35s) > gpu (34s)
So I apply to the very same code, a profiler, and got the following graphs:
None
CPU
GPU
Attention: reported times are higher than real times because of the profiler.
What appears is that the move back Tensor to CPU is costly (on my configuration, for my dataset), that's why, for me, during inference, none is a better option.
I expect that my situation is not isolated, and most users of Flair will find the same results.
So I am wondering, if a warning should be raised if CPU option is used during inference on GPU.
My questions are:
The text was updated successfully, but these errors were encountered: