Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the maximum input length / chunk inputs #11

Closed
sammcj opened this issue Aug 12, 2024 · 6 comments
Closed

Adjust the maximum input length / chunk inputs #11

sammcj opened this issue Aug 12, 2024 · 6 comments
Assignees
Milestone

Comments

@sammcj
Copy link

sammcj commented Aug 12, 2024

I've found that when adding input data of any reasonable length the demo errors and in the console I see it's limited to just 2048 tokens:

ValueError: Input length of input_ids is 22711, but `max_length` is set to 2048. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.

Is it possible to set the maximum input length at runtime or perhaps chunk the input data into <= 2048 tokens?

Neat demo!

@davidmezzetti
Copy link
Member

Thank you for the kind words.

22711 seems like quite a long context. Are you running this through Docker or did you install directly?

Nonetheless, I can add a MAXLENGTH env parameter.

@sammcj
Copy link
Author

sammcj commented Aug 12, 2024

Hey David,

Thanks for taking the time to respond.

It may sound a little crazy at first but my most common use case for rag is with code repos, these usually range from 12-32k - I was keen on trying this out with them and comparing it to having it completely loaded into context. Thought it’d be an interesting experiment!

Im running it through docker.

@davidmezzetti
Copy link
Member

Got it. I'll use this issue to add the new parameter. Obviously it will depend on having a LLM that accepts a context that size and enough GPU memory.

@sammcj
Copy link
Author

sammcj commented Aug 12, 2024

Yeah for sure, for most other AI/LLM projects I use either Ollama or Exllamav2 (via TabbyAPI) and quantise the k/v cache to q8_0 - so I regurarly run models that are 8-22b with 32-64K context sizes (1x 3090 + 2x A4000) which is incredibly useful!

@davidmezzetti
Copy link
Member

This has been added in.

If you want to try it before the next Docker build, you can copy over the updated rag.py and rebuild the Docker image.

@davidmezzetti davidmezzetti added this to the v0.4.0 milestone Aug 15, 2024
@davidmezzetti davidmezzetti self-assigned this Aug 15, 2024
@Kareem21
Copy link

Kareem21 commented Oct 3, 2024

How would you change the LLM to llama for example? Im a bit confused as it seems like the application has a built in LLM? under rag.py it does the

self.llm = LLM(... and passes mistral7b here)..

where is it getting this from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants