-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust the maximum input length / chunk inputs #11
Comments
Thank you for the kind words.
Nonetheless, I can add a MAXLENGTH env parameter. |
Hey David, Thanks for taking the time to respond. It may sound a little crazy at first but my most common use case for rag is with code repos, these usually range from 12-32k - I was keen on trying this out with them and comparing it to having it completely loaded into context. Thought it’d be an interesting experiment! Im running it through docker. |
Got it. I'll use this issue to add the new parameter. Obviously it will depend on having a LLM that accepts a context that size and enough GPU memory. |
Yeah for sure, for most other AI/LLM projects I use either Ollama or Exllamav2 (via TabbyAPI) and quantise the k/v cache to q8_0 - so I regurarly run models that are 8-22b with 32-64K context sizes (1x 3090 + 2x A4000) which is incredibly useful! |
This has been added in. If you want to try it before the next Docker build, you can copy over the updated rag.py and rebuild the Docker image. |
How would you change the LLM to llama for example? Im a bit confused as it seems like the application has a built in LLM? under rag.py it does the self.llm = LLM(... and passes mistral7b here).. where is it getting this from? |
I've found that when adding input data of any reasonable length the demo errors and in the console I see it's limited to just 2048 tokens:
Is it possible to set the maximum input length at runtime or perhaps chunk the input data into <= 2048 tokens?
Neat demo!
The text was updated successfully, but these errors were encountered: