-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent streaming behaviour with ChatGoogleGenerativeAI via callback handlers #3831
Comments
🤖 Hey there, @StreetLamb! Nice to see you back in the langchainjs universe. Your knack for details never ceases to amaze me! 🚀 Thank you for bringing this to our attention. The behavior you're experiencing is due to the fact that the Google Generative AI API, which is used by the In the const res = await this.caller.callWithOptions(
{ signal: options?.signal },
async () => {
let output;
try {
output = await this.client.generateContent({
contents: prompt,
});
// eslint-disable-next-line @typescript-eslint/no-explicit-any
} catch (e: any) {
// TODO: Improve error handling
if (e.message?.includes("400 Bad Request")) {
e.status = 400;
}
throw e;
}
return output;
}
); You can find this in the libs/langchain-google-genai/src/chat_models.ts file. On the other hand, the There was a similar issue in the past, ChatGoogleGenerativeAI: not call callbacks (_runManager?.handleLLMNewToken), which was resolved by a pull request: google-genai[patch]: Hookup callbacks to stream & generate methods. However, this fix was specific to hooking up callbacks to the stream and generate methods, and it did not change the fact that the Google Generative AI API does not support streaming responses in a chunked format. I hope this clarifies the situation. If you have any further questions, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@dosu-bot const stream = await model.stream('Tell me a joke.');
for await (const chunk of stream) {
console.log('Chunk received:', chunk);
}
/**
* {
token: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his'
}
Chunk received: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his',
name: 'model',
additional_kwargs: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his',
name: 'model',
additional_kwargs: {}
}
{ token: ' field!' }
Chunk received: AIMessageChunk {
lc_serializable: true,
lc_kwargs: { content: ' field!', name: 'model', additional_kwargs: {} },
lc_namespace: [ 'langchain_core', 'messages' ],
content: ' field!',
name: 'model',
additional_kwargs: {}
}
*/ |
🤖 Based on the information you've provided and the context from the LangChainJS repository, it seems that the In the provided context, the To resolve this, you would need to modify the async *stream(messages: HumanMessage[], options?: any) {
const response = await fetch('https://api.google.com/generativeai', {
method: 'POST',
body: JSON.stringify(messages),
});
const reader = response.body.getReader();
let { value: chunk, done: readerDone } = await reader.read();
while (!readerDone) {
yield chunk;
const result = await reader.read();
chunk = result.value;
readerDone = result.done;
}
} Please note that this is a simplified example and the actual implementation would depend on the specifics of the Google Generative AI API and the LangChainJS framework. There are also similar issues that have been solved in the LangChainJS repository. For instance, the issue ChatGoogleGenerativeAI: not call callbacks (_runManager?.handleLLMNewToken) was closed by a pull request: google-genai[patch]: Hookup callbacks to stream & generate methods on December 18, 2023. This issue might provide some insights into your problem. Unfortunately, I wasn't able to find how the Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
still the similar issue appears with python module #22802. Can some one look into that plz |
Issue Description
I am encountering an issue with the
ChatGoogleGenerativeAI
model where the responses are not streamed as expected when using callback handlers. The expected behaviour, which I observe withChatOpenAI
andChatAnthropic
, is to receive responses in a chunked format. However, with ChatGoogleGenerativeAI, responses are returned as a whole, which deviates from the expected behaviour.Using ChatOpenAI for comparison:
Using ChatGoogleGenerativeAI with the same handler:
Expected Behavior
The handleLLMNewToken method should stream responses in a chunked format for all models, including ChatGoogleGenerativeAI.
The text was updated successfully, but these errors were encountered: