inference time cost gap for FlagEmbedding 1.2 and 1.3 #1233

Atlantic8 · 2024-11-15T10:50:31Z

i use the same reranker model bge-reranker-v2-m3 and same python scripts,

`
from FlagEmbedding import FlagReranker

model = FlagReranker(model_path, use_fp16=True)

model.compute_score(qp_pairs, normalize=True)
`

the environments only diff in version of FlagEmbedding. however inference time cost for FlagEmbedding=1.3 is almost twice as long as that with FlagEmbedding=1.2, unfortunately I have to use FlagEmbedding 1.3 because i have to finetune the model with query_instruction_for_rerank, passage_max_length and sep_token.

can anyone help with this problem?

hanhainebula · 2024-11-18T06:33:33Z

Hello, @Atlantic8! Could you present more details such as the devices used for inference and the number of sentence pairs? Then we will check the reason of this problem. Thank you.

Atlantic8 · 2024-11-18T08:54:09Z

Hello, @Atlantic8! Could you present more details such as the devices used for inference and the number of sentence pairs? Then we will check the reason of this problem. Thank you.

device is Nvidia V100 32G. I used 20 <query, doc> pairs, where the query is long (like 1000 tokens) and doc is short (like 20 tokens).

hanhainebula · 2024-11-22T02:29:41Z

Hello, @Atlantic8. This is normal since initializing multiple devices (refer to here) need some time. Considering that the number of sentence pairs you inference here is only 20, you can add parameter devices="cuda:0" to use only one GPU to save the time for initializing multiple devices. The modified code:

from FlagEmbedding import FlagReranker

model = FlagReranker(model_path, use_fp16=True, devices="cuda:0")

model.compute_score(qp_pairs, normalize=True)

Atlantic8 · 2024-11-22T03:58:32Z

I tried your solution, unfortunately, it's not working.
I built a service, the model was initialized once only. when i downgraded FlagEmbedding to 1.2, time cost decreased by nearly 50%, so I think it must be something with FlagEmbedding version.

hanhainebula · 2024-11-22T04:10:20Z

When using FlagEmbedding 1.2, how many devices did you use? If the number of devices is also 1, then this gap was not quite as expected🤔.

Atlantic8 · 2024-11-22T06:11:17Z

only 1.
The only difference is FlagEmbedding version, other variables are the same.

hanhainebula · 2024-11-22T07:40:02Z

Hello, @Atlantic8. Here is an example for testing the inference time:

import os
import time
import datasets
from FlagEmbedding import FlagReranker


def test_inference_time(reranker: FlagReranker, sentences: list, number: int = 20):
    if len(sentences) > number:
        sentences = sentences[:number]
    elif len(sentences) < number:
        sentences = sentences * (number // len(sentences) + 1)
        sentences = sentences[:number]
    start_time = time.time()
    scores = reranker.compute_score(sentences, batch_size=16, max_length=1024, normalize=True)
    end_time = time.time()
    print("=====================================")
    print("Number of pairs: ", number)
    print("Time cost: ", end_time - start_time)


def main():
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"
    
    reranker = FlagReranker(
        'BAAI/bge-reranker-v2-m3',
        use_fp16=True
    )

    cache_dir = "~/.cache"

    queries = datasets.load_dataset('Shitao/MLDR', "hi", cache_dir=cache_dir, trust_remote_code=True)["test"].select(range(20))
    corpus = datasets.load_dataset('Shitao/MLDR', "corpus-hi", cache_dir=cache_dir, trust_remote_code=True)["corpus"].select(range(20))

    sentences = [(q["query"], d["text"]) for q, d in zip(queries, corpus)]
    
    print("Warm up")
    reranker.compute_score([("hello world", "hello world")])
    
    test_inference_time(reranker, sentences, number=20)
    test_inference_time(reranker, sentences, number=1000)
    test_inference_time(reranker, sentences, number=10000)


if __name__ == '__main__':
    main()

For FlagEmbedding 1.2.10, the test result is:

=====================================
Number of pairs:  20
Time cost:  0.30451512336730957
=====================================
Number of pairs:  1000
Time cost:  12.109597444534302
=====================================
Number of pairs:  10000
Time cost:  121.79582500457764

For FlagEmbedding 1.3.2, the test result is:

=====================================
Number of pairs:  20
Time cost:  0.47469186782836914
=====================================
Number of pairs:  1000
Time cost:  12.363729476928711
=====================================
Number of pairs:  10000
Time cost:  120.30488777160645

The Warm up part is to move the model to the target device to make the comparison fair. For FlagEmbedding 1.2.10, this operation is completed in the __init__ function (refer to here). For FlagEmbedding 1.3.2, this operation is completed in the compute_score function (refer to here).

From the above results, we can observe that there is no big gap between FlagEmbedding 1.2 and 1.3. Hope this result can help you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference time cost gap for FlagEmbedding 1.2 and 1.3 #1233

inference time cost gap for FlagEmbedding 1.2 and 1.3 #1233

Atlantic8 commented Nov 15, 2024 •

edited

Loading

hanhainebula commented Nov 18, 2024

Atlantic8 commented Nov 18, 2024

hanhainebula commented Nov 22, 2024

Atlantic8 commented Nov 22, 2024 •

edited

Loading

hanhainebula commented Nov 22, 2024

Atlantic8 commented Nov 22, 2024 •

edited

Loading

hanhainebula commented Nov 22, 2024

inference time cost gap for FlagEmbedding 1.2 and 1.3 #1233

inference time cost gap for FlagEmbedding 1.2 and 1.3 #1233

Comments

Atlantic8 commented Nov 15, 2024 • edited Loading

hanhainebula commented Nov 18, 2024

Atlantic8 commented Nov 18, 2024

hanhainebula commented Nov 22, 2024

Atlantic8 commented Nov 22, 2024 • edited Loading

hanhainebula commented Nov 22, 2024

Atlantic8 commented Nov 22, 2024 • edited Loading

hanhainebula commented Nov 22, 2024

Atlantic8 commented Nov 15, 2024 •

edited

Loading

Atlantic8 commented Nov 22, 2024 •

edited

Loading

Atlantic8 commented Nov 22, 2024 •

edited

Loading