You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if support for batched prompts is in your roadmap? There are solutions that make this possible for several language models, so are you planning on including these optimisations in your source?
TIA
The text was updated successfully, but these errors were encountered:
saswat0
changed the title
Scope for batched preedictions
Scope for batched predictions
Dec 1, 2023
One that I can think of is that, if deployed into production, the server could queue the requests (prompts) and the RAG would run only once. Effectively, the time difference would be slightly higher but GPu utilisation would increase by several folds
@snexus Kudos on this awesome project!
I was wondering if support for batched prompts is in your roadmap? There are solutions that make this possible for several language models, so are you planning on including these optimisations in your source?
TIA
The text was updated successfully, but these errors were encountered: