-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: About multiple users chat slow #8972
Comments
🤖 Hello @dinhan92, Great to see you again and thanks for reaching out. I'm currently looking into your issue and will get back to you with a thorough response shortly. |
🤖 Hello @dinhan92, Good to see you again! I hope you've been doing well since our last interaction. Let's dive right into your issue. Based on the information you've provided, it seems like the response time of your LlamaIndex application increases when you test it on multiple computers. This could be due to a variety of factors, including network latency, the performance of the computers you're testing on, or the configuration of your LlamaIndex application. Switching your system to Linux may not necessarily solve this issue, as the performance of LlamaIndex is not solely dependent on the operating system. It's also influenced by other factors such as the Python version, the versions of the required libraries, and the server configuration. In a similar issue titled Reaction is very slow, is there any params to reduce reaction time?, the suggested solution was to use a newer version of the model, gpt-3.5, which is reported to be faster in response. You might want to consider this if you're not already using this version. Also, ensure that your system meets the system requirements for running LlamaIndex optimally. This includes using the correct Python version (>=3.8.1 and <3.12) and the correct versions of the required libraries. If you're still experiencing slow response times after checking these factors, it would be helpful if you could provide more information about your setup. For example, the specific versions of Python and the libraries you're using, the specifications of the computers you're testing on, and any error messages you're seeing. This will help in diagnosing the issue. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
I use window server 2019, python 3.11.5, I have 2 ways approaches, one use flask with wfastcgi, second is using Quart with nssm python service and use a api C# calling on the service. Both response pretty slow with 2 computers @app.route('/api/chatbot', methods=['GET', 'POST'])
async def chatbot(**kwargs) -> str:
try:
data = await request.data
input_text = data.decode("utf-8")
report_indices = {}
query_engine = {}
query_engine_tools = []
# rebuild storage context
for name_pdf in name_pdfs:
name_pdf_get = name_pdf.Name
storage_context = StorageContext.from_defaults(persist_dir=f"./storage_{name_pdf_get}")
report_indices[name_pdf_get] = load_index_from_storage(storage_context = storage_context,
# embed_model=embed_model,
service_context = service_context)
query_engine[name_pdf_get] = report_indices[name_pdf_get].as_query_engine()
query_engine_tools.append(QueryEngineTool(query_engine=query_engine[name_pdf_get], metadata = ToolMetadata(name=name_pdf_get, description=name_pdf.Description)))
agent = OpenAIAgent.from_tools(
tools=query_engine_tools,
llm=llm,
chat_history=[
ChatMessage(role=MessageRole.USER, content="Chào BSNH AI, chúng ta sẽ bàn về nông nghiệp ngày hôm nay."),
ChatMessage(role=MessageRole.ASSISTANT, content="Được bạn ơi, nghe có vẻ hấp dẫn."),
ChatMessage(role=MessageRole.USER, content="Bạn có khỏe không?"),
ChatMessage(role=MessageRole.ASSISTANT, content="Tôi rất khỏe, cám ơn."),
ChatMessage(role=MessageRole.USER, content="Sức khỏe của bạn như thế nào?"),
ChatMessage(role=MessageRole.ASSISTANT, content="Tôi cực kỳ khỏe luôn ấy, sức có thể vỡ núi."),
ChatMessage(role=MessageRole.USER, content="Cây khoai lang là cây gì?"),
ChatMessage(role=MessageRole.ASSISTANT, content="Cây khoai lang là một loại cây lương thực."),
],
prefix_messages=[
ChatMessage(role=MessageRole.SYSTEM, content="Bạn là một trợ lý ảo của TNH99, có tên là BSNH AI, được phát triển bởi TNH99, chủ đề chính của bạn là nông nghiệp. Nếu bạn không trả lời được hãy bảo người dùng liên hệ với TNH99. Vui lòng trả lời bằng tiếng Việt."),
ChatMessage(role=MessageRole.USER, content="Chào BSNH AI, chúng ta sẽ bàn về nông nghiệp ngày hôm nay."),
ChatMessage(role=MessageRole.ASSISTANT, content="Được bạn ơi, nghe có vẻ hấp dẫn."),
ChatMessage(role=MessageRole.USER, content="Bạn có khỏe không?"),
ChatMessage(role=MessageRole.ASSISTANT, content="Tôi rất khỏe, cám ơn."),
ChatMessage(role=MessageRole.USER, content="Sức khỏe của bạn như thế nào?"),
ChatMessage(role=MessageRole.ASSISTANT, content="Tôi cực kỳ khỏe luôn ấy, sức có thể vỡ núi."),
ChatMessage(role=MessageRole.USER, content="Cây khoai lang là cây gì?"),
ChatMessage(role=MessageRole.ASSISTANT, content="Cây khoai lang là một loại cây lương thực."),
],
verbose=True
)
agent_stream = await agent.achat(input_text)
savedAnswer = dumps(ChatHistory(Question=input_text, Answer=agent_stream.response, IsDone=True, IdUser="").__dict__)
return app.response_class(savedAnswer)
except Exception as e:
print(e)
exception = "Xin lỗi, BSNH AI đang bận."
return app.response_class(exception) |
@dinhan92 a lot of this depends on how you designed your app. I see you are loading the index on every request. This is not ideal, you should look into using a global query engine maybe? |
@logan-markewich what do you mean by using a global query engine? |
Hi @dinhan92, each time your endpoint is hit, there is an overhead when the indices are loaded. A better way would be to load the indices outside of the controller(globally instead of within the endpoint function) and then construct the query engines and the query engine tools array. You can go further and even construct the OpenAI agent outside the controller. Try to minimise the amount of work being done within your controller. A good rule of thumb is, 'if it is something that will not change with every api call, then do it outside of your controller.' Of course there is a memory cost to this but that's the tradeoff and, if you ask me, it's well worth it since the memory will be used either way when the endpoint is hit. |
@gich2009 is correct ! |
Question Validation
Question
I have try using llama index not stream and with flask and publish on windows. I tested 2 computers using postman. However I find out that when I only test 1 computer, it response in 1 minute. However if with 2 computers, it response in 2 minutes.
So, do I have to change the system to Linux to fix this, or is there anything that I missed in llama index docs?
The text was updated successfully, but these errors were encountered: