-
Notifications
You must be signed in to change notification settings - Fork 16.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChatHuggingFace + HuggingFaceEndpoint does not properly implement max_new_tokens
#23586
Comments
Okay, you are comparing two different things. The Huggingface Inference Client returns the following object, which has an attribute of The
So, you are implicitly comparing the P.S. I double-checked the LangChain code and ensured that |
I think you are misunderstanding the example code, the |
I'm having the same problem... Did you find a solution? |
No I have not, this issue renders the entire Huggingface x Langchain implementation obsolete to me. I have been attempting to work around the issue using an OpenAI compatible web server through either LlamaCpp/Ollama. |
I think this is a You have to call output_msg = chat_model.bind(max_tokens=8192, temperature=0.0).invoke(chat_sequence) |
I'm also having the same problem, and the solution from @michael-newsrx doesn't work for me since the llm is invoked in It works fine using Everything works great with Do we have an estimated timeline for when this bug will be fixed? CC: @baskaryan , @hwchase17 , @efriis , @eyurtsev @ccurme , @nfcampos |
Appreciate the workaround! |
I've created a bug fix proposal inside this PR that solves this issue: propagate HuggingFaceEndpoint config to ChatHuggingFace #27719 |
Hi, @BobMerkus. I'm Dosu, and I'm helping the LangChain team manage their backlog. I'm marking this issue as stale. Issue Summary:
Next Steps:
Thank you for your understanding and contribution! |
Checked other resources
Example Code
This is the output from the script:
Error Message and Stack Trace (if applicable)
AssertionError: Response 10 should be shorter than response 4096, 10
max_new_tokens
: 101, 4096max_new_tokens
: 101Description
There seems to be an issues when using
langchain_huggingface.llms.huggingface_endpoint.HuggingFaceEndpoint
together with thelangchain_huggingface.chat_models.huggingface.ChatHuggingFace
implementation.When just using the
HuggingFaceEndpoint
, the parametermax_new_tokens
is properly implemented, while this does not work properly when wrapping insideChatHuggingFace(llm=...)
. The latter implementation always returns a response of 100 tokens, and I am unable to get this to work properly after searching the docs + source code.I have created a reproducible example using
meta-llama/Meta-Llama-3-70B-Instruct
(as this model is also supported for serverless).System Info
System Information
Package Information
Packages not installed (Not Necessarily a Problem)
The following packages were not found:
The text was updated successfully, but these errors were encountered: