High Latency in the NER v3 (Chain of Thought) #352
Replies: 4 comments 3 replies
-
Hi @innocent-charles, the higher latency is probably due to the LLM generating more output tokens due to the CoT mechanism. There is no way around this, I'm afraid - but you can still uses v2 of the NER recipe if you find it works well enough. Most more advanced prompting techniques come the disadvantage of requiring more output tokens and thus having more latency.
Failed how? |
Beta Was this translation helpful? Give feedback.
-
Yes, Thank you @rmitsch I get it. It failed because of time out, so I just increased the maximum number of retries... |
Beta Was this translation helpful? Give feedback.
-
The problems that i was faced with version 2 is :
So have tried to think of that maybe it might be because of the repeated entities, that spacy llm when it encouter the value of entity has already been extracted it won't shown up even if that same value can be reffered as another entity. |
Beta Was this translation helpful? Give feedback.
-
Yes exactly ..since the problem with NER.v3 with CoT mechanism has introduced higher latency which in turn has affected the usability. So would just ask or recommend if it is possible to just take the responses from LLM as they are returned. Since i think the problem might be in the spacy framework especially on how .ents work. stand to be corrected if am wrong..... |
Beta Was this translation helpful? Give feedback.
-
I have tried the NER v3 that including a chain of thought mechanism behind. But it seems that we i compare to other version(v2,v1, then v3 has higher latency that the response takes a lot of time to come and sometime failed.
Beta Was this translation helpful? Give feedback.
All reactions