-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inference time cost gap for FlagEmbedding 1.2 and 1.3 #1233
Comments
Hello, @Atlantic8! Could you present more details such as the devices used for inference and the number of sentence pairs? Then we will check the reason of this problem. Thank you. |
device is Nvidia V100 32G. I used 20 <query, doc> pairs, where the query is long (like 1000 tokens) and doc is short (like 20 tokens). |
Hello, @Atlantic8. This is normal since initializing multiple devices (refer to here) need some time. Considering that the number of sentence pairs you inference here is only 20, you can add parameter from FlagEmbedding import FlagReranker
model = FlagReranker(model_path, use_fp16=True, devices="cuda:0")
model.compute_score(qp_pairs, normalize=True) |
I tried your solution, unfortunately, it's not working. |
When using FlagEmbedding 1.2, how many devices did you use? If the number of devices is also 1, then this gap was not quite as expected🤔. |
only 1. |
Hello, @Atlantic8. Here is an example for testing the inference time: import os
import time
import datasets
from FlagEmbedding import FlagReranker
def test_inference_time(reranker: FlagReranker, sentences: list, number: int = 20):
if len(sentences) > number:
sentences = sentences[:number]
elif len(sentences) < number:
sentences = sentences * (number // len(sentences) + 1)
sentences = sentences[:number]
start_time = time.time()
scores = reranker.compute_score(sentences, batch_size=16, max_length=1024, normalize=True)
end_time = time.time()
print("=====================================")
print("Number of pairs: ", number)
print("Time cost: ", end_time - start_time)
def main():
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
reranker = FlagReranker(
'BAAI/bge-reranker-v2-m3',
use_fp16=True
)
cache_dir = "~/.cache"
queries = datasets.load_dataset('Shitao/MLDR', "hi", cache_dir=cache_dir, trust_remote_code=True)["test"].select(range(20))
corpus = datasets.load_dataset('Shitao/MLDR', "corpus-hi", cache_dir=cache_dir, trust_remote_code=True)["corpus"].select(range(20))
sentences = [(q["query"], d["text"]) for q, d in zip(queries, corpus)]
print("Warm up")
reranker.compute_score([("hello world", "hello world")])
test_inference_time(reranker, sentences, number=20)
test_inference_time(reranker, sentences, number=1000)
test_inference_time(reranker, sentences, number=10000)
if __name__ == '__main__':
main() For FlagEmbedding 1.2.10, the test result is: =====================================
Number of pairs: 20
Time cost: 0.30451512336730957
=====================================
Number of pairs: 1000
Time cost: 12.109597444534302
=====================================
Number of pairs: 10000
Time cost: 121.79582500457764 For FlagEmbedding 1.3.2, the test result is: =====================================
Number of pairs: 20
Time cost: 0.47469186782836914
=====================================
Number of pairs: 1000
Time cost: 12.363729476928711
=====================================
Number of pairs: 10000
Time cost: 120.30488777160645 The From the above results, we can observe that there is no big gap between FlagEmbedding 1.2 and 1.3. Hope this result can help you. |
i use the same reranker model bge-reranker-v2-m3 and same python scripts,
`
from FlagEmbedding import FlagReranker
model = FlagReranker(model_path, use_fp16=True)
model.compute_score(qp_pairs, normalize=True)
`
the environments only diff in version of FlagEmbedding. however inference time cost for FlagEmbedding=1.3 is almost twice as long as that with FlagEmbedding=1.2, unfortunately I have to use FlagEmbedding 1.3 because i have to finetune the model with query_instruction_for_rerank, passage_max_length and sep_token.
can anyone help with this problem?
The text was updated successfully, but these errors were encountered: