Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Longformer] fix longformer slow-down #5811

Merged

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Jul 16, 2020

A drastic slow-down of Longformer after the PR: #5219 was deteced by @HHousen (thanks a lot!) here: #4406 (comment) .

After digging a bit into the code it can be seen that the line:
is_global_attn = any(is_index_global_attn.flatten()) is the reason of the drastic slow-down.

Running the following benchmark on master:

python examples/benchmarking/run_benchmark.py --models allenai/longformer-base-4096 --no_memory --sequence_length 512 1024

yields the following result:

====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
          Model Name             Batch Size     Seq Length     Time in s   
--------------------------------------------------------------------------------
 allenai/longformer-base-4096        8              512            0.647     
 allenai/longformer-base-4096        8              1024           1.284     
--------------------------------------------------------------------------------

====================        ENVIRONMENT INFORMATION         ====================
- transformers_version: 3.0.2
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.5.0
- python_version: 3.7.7
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-07-16
- time: 14:07:02.719531
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32089
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 0
- use_tpu: False

Now on this branch the results are as follows:

====================       INFERENCE - SPEED - RESULT       ====================
--------------------------------------------------------------------------------
          Model Name             Batch Size     Seq Length     Time in s   
--------------------------------------------------------------------------------
 allenai/longformer-base-4096        8              512            0.141     
 allenai/longformer-base-4096        8              1024           0.271     
--------------------------------------------------------------------------------

====================        ENVIRONMENT INFORMATION         ====================
- transformers_version: 3.0.2
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.5.0
- python_version: 3.7.7
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-07-16
- time: 14:07:57.623378
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32089
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 0
- use_tpu: False

So this simple line is responsible for a slow-down of factor 4.

Moral of the story: Never use any() on a PyTorch tensor => always use tensor.any(). I wonder if this is actually a known problem of PyTorch/Python. It might be a good to check our code if we have more statements like this.

Another lesson for me is that one should always run the benchmarking before and after doing such a big refactoring as in #5219 .
It's very simple to run the benchmark script for a model and takes usually only a couple of seconds. Ideally we should have performance regression tests to automatically detect such slow downs.

Pinging @thomwolf @mfuntowicz @sshleifer @LysandreJik @ibeltagy - think this is quite interesting.

@patrickvonplaten patrickvonplaten changed the title [Longformer] fix longformer slow down [Longformer] fix longformer slow-down Jul 16, 2020
@codecov
Copy link

codecov bot commented Jul 16, 2020

Codecov Report

Merging #5811 into master will decrease coverage by 0.80%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5811      +/-   ##
==========================================
- Coverage   78.19%   77.38%   -0.81%     
==========================================
  Files         146      146              
  Lines       26047    26047              
==========================================
- Hits        20367    20157     -210     
- Misses       5680     5890     +210     
Impacted Files Coverage Δ
src/transformers/modeling_longformer.py 89.21% <100.00%> (ø)
src/transformers/modeling_tf_mobilebert.py 23.38% <0.00%> (-73.39%) ⬇️
src/transformers/generation_tf_utils.py 86.21% <0.00%> (ø)
src/transformers/generation_utils.py 97.11% <0.00%> (+0.28%) ⬆️
src/transformers/modeling_openai.py 82.31% <0.00%> (+1.28%) ⬆️
src/transformers/modeling_tf_openai.py 95.18% <0.00%> (+74.91%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89a78be...25c6419. Read the comment docs.

@patrickvonplaten patrickvonplaten merged commit 057411c into huggingface:master Jul 16, 2020
@sshleifer
Copy link
Contributor

Great find. +1 to tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants