Replies: 1 comment
-
I found some similar discussions and issues related to performance and latency in LlamaIndex workflows that might be helpful for your situation:
Given your setup, here are some specific suggestions to consider:
These strategies should help you address the latency issues you're experiencing with your current workflow setup. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am working on implementing an internal virtual assistant and have a few questions which I am hoping you guys might be able to support me with.
My use case is to be able to anwer questions on internal documnets and integrate qna with the ability to do followup questions.
For the case I am doing my implementation on custom CondensePlusContext chat engine where I change a bit the logic so I can pass nodes directly to the chat method, rather than always go for retrieval.
I was thinking of using worfklows so that I am flexible and add more steps and route into certain aspects, however the more and more I dig deeper I am rethinking whether it actually makes sense to use workflows.
My major issue right now is actually latency - the time it takes to go through the workflow is ~15sec which is a lot.
Below is a sample code without all the steps, just a more simplified case where it also takes ~12 seconds, using llama-deploy for deployment. I am not sure where I am doing something wrong in the way I am using it or workflows is not the right approach.
Workflow.py
`
#Deploy Core
Beta Was this translation helpful? Give feedback.
All reactions