CPU storing option is slower than None storing during inference on GPU #1070

pommedeterresautee · 2019-09-05T09:42:24Z

Regarding storing of embedding, my understanding is the following:

gpu: dynamic embeddings (Flair LM) are deleted after each batch, static embeddings are kept on GPU
cpu: dynamic embeddings (Flair LM) are deleted after each batch, static embeddings are moved to RAM, and if required later, moved back on CPU
none: dynamic and static embeddings are deleted after each batch

Expected time during inference:
none > cpu > gpu
However, making some measures, I saw:
cpu (43s) > none (35s) > gpu (34s)

So I apply to the very same code, a profiler, and got the following graphs:

None

CPU

GPU

Attention: reported times are higher than real times because of the profiler.
What appears is that the move back Tensor to CPU is costly (on my configuration, for my dataset), that's why, for me, during inference, none is a better option.

I expect that my situation is not isolated, and most users of Flair will find the same results.

So I am wondering, if a warning should be raised if CPU option is used during inference on GPU.

My questions are:

can you check if you find the same pattern on your computers?
do you want me to push a PR on that? (plus short explanation in the documentation, python and tuto)

alanakbik · 2019-09-05T12:33:46Z

Hi @pommedeterresautee thanks for sharing this analysis. What profiler are you using?

Yes, moving tensor to/from GPU is costly, which is why by default we set 'none' as storage mode for the predict() method. The only reason to change this to something else would be if we want to not only use the predictions but also the embeddings after prediction.

A PR adding a warning would be great - we should probably also point this out in the docs that during inference you almost always want to use storage mode 'none'.

pommedeterresautee · 2019-09-05T19:19:04Z

The profiler is cProfile and the graph is produced by PyCharm (paid version, I don't know if the community version have such feature).

pommedeterresautee added the question Further information is requested label Sep 5, 2019

pommedeterresautee changed the title ~~CPU storing is slower than None storing~~ CPU storing option is slower than None storing during inference on GPU Sep 5, 2019

This was referenced Sep 5, 2019

Reduce inference time by 30% when using Flair embeddings #1074

Merged

How embeddings clearing works on embedding storage set to GPU #1076

Closed

alanakbik closed this as completed in #1074 Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU storing option is slower than None storing during inference on GPU #1070

CPU storing option is slower than None storing during inference on GPU #1070

pommedeterresautee commented Sep 5, 2019

alanakbik commented Sep 5, 2019

pommedeterresautee commented Sep 5, 2019

CPU storing option is slower than None storing during inference on GPU #1070

CPU storing option is slower than None storing during inference on GPU #1070

Comments

pommedeterresautee commented Sep 5, 2019

alanakbik commented Sep 5, 2019

pommedeterresautee commented Sep 5, 2019