[Question]: About multiple users chat slow #8972

dinhan92 · 2023-11-17T04:25:44Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

I have try using llama index not stream and with flask and publish on windows. I tested 2 computers using postman. However I find out that when I only test 1 computer, it response in 1 minute. However if with 2 computers, it response in 2 minutes.

So, do I have to change the system to Linux to fix this, or is there anything that I missed in llama index docs?

dosubot · 2023-11-17T04:26:06Z

🤖

Hello @dinhan92,

Great to see you again and thanks for reaching out. I'm currently looking into your issue and will get back to you with a thorough response shortly.

dosubot · 2023-11-17T04:34:32Z

🤖

Hello @dinhan92,

Good to see you again! I hope you've been doing well since our last interaction. Let's dive right into your issue.

Based on the information you've provided, it seems like the response time of your LlamaIndex application increases when you test it on multiple computers. This could be due to a variety of factors, including network latency, the performance of the computers you're testing on, or the configuration of your LlamaIndex application.

Switching your system to Linux may not necessarily solve this issue, as the performance of LlamaIndex is not solely dependent on the operating system. It's also influenced by other factors such as the Python version, the versions of the required libraries, and the server configuration.

In a similar issue titled Reaction is very slow, is there any params to reduce reaction time?, the suggested solution was to use a newer version of the model, gpt-3.5, which is reported to be faster in response. You might want to consider this if you're not already using this version.

Also, ensure that your system meets the system requirements for running LlamaIndex optimally. This includes using the correct Python version (>=3.8.1 and <3.12) and the correct versions of the required libraries.

If you're still experiencing slow response times after checking these factors, it would be helpful if you could provide more information about your setup. For example, the specific versions of Python and the libraries you're using, the specifications of the computers you're testing on, and any error messages you're seeing. This will help in diagnosing the issue.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dinhan92 · 2023-11-17T04:59:21Z

I use window server 2019, python 3.11.5, I have 2 ways approaches, one use flask with wfastcgi, second is using Quart with nssm python service and use a api C# calling on the service. Both response pretty slow with 2 computers
Here is my python code:

@app.route('/api/chatbot', methods=['GET', 'POST'])
async def chatbot(**kwargs) -> str:
    try:
        data = await request.data
        input_text = data.decode("utf-8")

        report_indices = {}
        query_engine = {}
        query_engine_tools = []
        # rebuild storage context
        for name_pdf in name_pdfs:
            name_pdf_get = name_pdf.Name
            storage_context = StorageContext.from_defaults(persist_dir=f"./storage_{name_pdf_get}")
            report_indices[name_pdf_get] = load_index_from_storage(storage_context = storage_context,
                                        # embed_model=embed_model, 
                                        service_context = service_context)
            query_engine[name_pdf_get] = report_indices[name_pdf_get].as_query_engine()
            query_engine_tools.append(QueryEngineTool(query_engine=query_engine[name_pdf_get], metadata = ToolMetadata(name=name_pdf_get, description=name_pdf.Description)))

        agent = OpenAIAgent.from_tools(
            tools=query_engine_tools,
            llm=llm,
            chat_history=[
                ChatMessage(role=MessageRole.USER, content="Chào BSNH AI, chúng ta sẽ bàn về nông nghiệp ngày hôm nay."),
                ChatMessage(role=MessageRole.ASSISTANT, content="Được bạn ơi, nghe có vẻ hấp dẫn."),
                ChatMessage(role=MessageRole.USER, content="Bạn có khỏe không?"),
                ChatMessage(role=MessageRole.ASSISTANT, content="Tôi rất khỏe, cám ơn."),
                ChatMessage(role=MessageRole.USER, content="Sức khỏe của bạn như thế nào?"),
                ChatMessage(role=MessageRole.ASSISTANT, content="Tôi cực kỳ khỏe luôn ấy, sức có thể vỡ núi."),
                ChatMessage(role=MessageRole.USER, content="Cây khoai lang là cây gì?"),
                ChatMessage(role=MessageRole.ASSISTANT, content="Cây khoai lang là một loại cây lương thực."),
            ],
            prefix_messages=[
                ChatMessage(role=MessageRole.SYSTEM, content="Bạn là một trợ lý ảo của TNH99, có tên là BSNH AI, được phát triển bởi TNH99, chủ đề chính của bạn là nông nghiệp. Nếu bạn không trả lời được hãy bảo người dùng liên hệ với TNH99. Vui lòng trả lời bằng tiếng Việt."),
                ChatMessage(role=MessageRole.USER, content="Chào BSNH AI, chúng ta sẽ bàn về nông nghiệp ngày hôm nay."),
                ChatMessage(role=MessageRole.ASSISTANT, content="Được bạn ơi, nghe có vẻ hấp dẫn."),
                ChatMessage(role=MessageRole.USER, content="Bạn có khỏe không?"),
                ChatMessage(role=MessageRole.ASSISTANT, content="Tôi rất khỏe, cám ơn."),
                ChatMessage(role=MessageRole.USER, content="Sức khỏe của bạn như thế nào?"),
                ChatMessage(role=MessageRole.ASSISTANT, content="Tôi cực kỳ khỏe luôn ấy, sức có thể vỡ núi."),
                ChatMessage(role=MessageRole.USER, content="Cây khoai lang là cây gì?"),
                ChatMessage(role=MessageRole.ASSISTANT, content="Cây khoai lang là một loại cây lương thực."),
            ],
            verbose=True
        )

        agent_stream = await agent.achat(input_text)

        savedAnswer = dumps(ChatHistory(Question=input_text, Answer=agent_stream.response, IsDone=True, IdUser="").__dict__)

        return app.response_class(savedAnswer)
    except Exception as e:
        print(e)
        exception = "Xin lỗi, BSNH AI đang bận."
        return app.response_class(exception)

logan-markewich · 2023-11-17T19:58:14Z

@dinhan92 a lot of this depends on how you designed your app.

I see you are loading the index on every request. This is not ideal, you should look into using a global query engine maybe?

dinhan92 · 2023-11-20T01:45:11Z

@logan-markewich what do you mean by using a global query engine?

gich2009 · 2023-11-22T08:15:40Z

Hi @dinhan92, each time your endpoint is hit, there is an overhead when the indices are loaded. A better way would be to load the indices outside of the controller(globally instead of within the endpoint function) and then construct the query engines and the query engine tools array. You can go further and even construct the OpenAI agent outside the controller. Try to minimise the amount of work being done within your controller. A good rule of thumb is, 'if it is something that will not change with every api call, then do it outside of your controller.'

Of course there is a memory cost to this but that's the tradeoff and, if you ask me, it's well worth it since the memory will be used either way when the endpoint is hit.

logan-markewich · 2023-11-22T17:58:56Z

@gich2009 is correct !

dinhan92 added the question Further information is requested label Nov 17, 2023

logan-markewich closed this as completed Nov 22, 2023

dosubot bot mentioned this issue May 17, 2024

[Question]: Unable for sending multi-request at the same time #13549

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: About multiple users chat slow #8972

[Question]: About multiple users chat slow #8972

dinhan92 commented Nov 17, 2023

dosubot bot commented Nov 17, 2023

dosubot bot commented Nov 17, 2023

dinhan92 commented Nov 17, 2023 •

edited

Loading

logan-markewich commented Nov 17, 2023

dinhan92 commented Nov 20, 2023

gich2009 commented Nov 22, 2023

logan-markewich commented Nov 22, 2023

[Question]: About multiple users chat slow #8972

[Question]: About multiple users chat slow #8972

Comments

dinhan92 commented Nov 17, 2023

Question Validation

Question

dosubot bot commented Nov 17, 2023

dosubot bot commented Nov 17, 2023

Sources

dinhan92 commented Nov 17, 2023 • edited Loading

logan-markewich commented Nov 17, 2023

dinhan92 commented Nov 20, 2023

gich2009 commented Nov 22, 2023

logan-markewich commented Nov 22, 2023

dinhan92 commented Nov 17, 2023 •

edited

Loading