-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to query Query Docs
mode when using sagemaker
#1367
Comments
Oof, thats a rough one to debug The error is somewhere in this block in LlamaIndex I don't really know where
I'm not 100% sure how privateGPT implements sagemaker LLMs, but it might be related to that? Something to do with how the LLM is streaming is my guess |
Thanks @logan-markewich I'll try to dig further on this way. As I'm not an expert, this could take a while, so if anyone has a solution or want to try, be my guest :) Also, I've tried with the latest 0.2.0 release and I see the same behavior. |
Based on this message, I've dig a little bit more and it seems that the prompt message is not passed to the endpoint. And then because the prompt is empty, it seems that the endpoint sends an uncatch exception in the message (but the endpoint clearly send back a 200 api call, so this is not really an exception here.) It seems that's this line that returns an empty prompt. I've tried to add some prints and I got this output : I'm not used to the I'll try to run more tests, don't hesitate if you have questions |
I think I have a solution to my problem. Thanks to the help of @logan-markewich (in this discord thread) I've dig quite a bit through the code and it appears to be an issue with the memory of the chat. The "cause" was the retrieval of quite a long context that would end up in the LLM just receive the context and not my question. I fixed it with two different solutions (and keep only the last one) :
class LLMSettings(BaseModel):
mode: Literal["local", "openai", "sagemaker", "mock"]
max_new_tokens: int = Field(
256,
description="The maximum number of token that the LLM is authorized to generate in one completion.",
)
## This is new
context_window_size: int = Field(
3900,
description=(
"Size of the context window.\n"
"llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room.\n"
"You may need to increase this context window, see https://github.com/imartinez/privateGPT/issues/1367."
),
) case "local":
from llama_index.llms import LlamaCPP
prompt_style = get_prompt_style(settings.local.prompt_style)
self.llm = LlamaCPP(
model_path=str(models_path / settings.local.llm_hf_model_file),
temperature=0.1,
max_new_tokens=settings.llm.max_new_tokens,
context_window=settings.llm.context_window_size, # This line is changed
generate_kwargs={},
# All to GPU
model_kwargs={"n_gpu_layers": -1},
# transform inputs into Llama2 format
messages_to_prompt=prompt_style.messages_to_prompt,
completion_to_prompt=prompt_style.completion_to_prompt,
verbose=True,
)
case "sagemaker":
from private_gpt.components.llm.custom.sagemaker import SagemakerLLM
self.llm = SagemakerLLM(
endpoint_name=settings.sagemaker.llm_endpoint_name,
max_new_tokens=settings.llm.max_new_tokens,
context_window=settings.llm.context_window_size, # This line is added
) If you think it's useful, I could make some PR |
I'm working on a PR that makes context_window and max_new_tokens customizable in settings.yaml Using the right context_window is the way to go imo. Thanks for the support and the documentation! |
Closed by #1437 |
Context
Hi everyone,
I have send a message through Discord but thought it would be easier to manage the issue here.
What I'm trying to achieve is to run privateGPT with some production-grade environment. To do so, I've tried to run something like :
For I have the following setups :
As we can see, whenever I'm trying to use sagemaker it seems to fail to use the RAG architecture.
How to reproduce
I've tried to have the simplest setup to reproduce, if you want me to test anything else, do not hesitate to ask me.
PGPT_PROFILES=sagemaker make run
Actual behavior
Expected behavior
To be able to use the
Query Docs
mode even when using sagemaker.The text was updated successfully, but these errors were encountered: