Inconsistent streaming behaviour with ChatGoogleGenerativeAI via callback handlers #3831

StreetLamb · 2023-12-29T15:50:01Z

Issue Description

I am encountering an issue with the ChatGoogleGenerativeAI model where the responses are not streamed as expected when using callback handlers. The expected behaviour, which I observe with ChatOpenAI and ChatAnthropic, is to receive responses in a chunked format. However, with ChatGoogleGenerativeAI, responses are returned as a whole, which deviates from the expected behaviour.

Using ChatOpenAI for comparison:

const handler = BaseCallbackHandler.fromMethods({
  handleLLMNewToken(token) {
    console.log({ token });
  },
});

const openAIModel = new ChatOpenAI({
  streaming: true,
  callbacks: [handler],
});

openAIModel.invoke('Tell me a joke?').then((res) => console.log(res));
/**
 * { token: '' }
{ token: 'Sure' }
{ token: ',' }
{ token: ' here' }
{ token: "'s" }
{ token: ' a' }
{ token: ' l' }
{ token: 'igh' }
{ token: 'the' }
{ token: 'art' }
{ token: 'ed' }
{ token: ' joke' }
{ token: ' for' }
{ token: ' you' }
{ token: ':\n\n' }
{ token: 'Why' }
{ token: ' don' }
{ token: "'t" }
{ token: ' scientists' }
{ token: ' trust' }
{ token: ' atoms' }
{ token: '?\n\n' }
{ token: 'Because' }
{ token: ' they' }
{ token: ' make' }
{ token: ' up' }
{ token: ' everything' }
{ token: '!' }
{ token: '' }
AIMessageChunk {
  lc_serializable: true,
  lc_kwargs: {
    content: "Sure, here's a lighthearted joke for you:\n" +
      '\n' +
      "Why don't scientists trust atoms?\n" +
      '\n' +
      'Because they make up everything!',
    additional_kwargs: {}
  },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: "Sure, here's a lighthearted joke for you:\n" +
    '\n' +
    "Why don't scientists trust atoms?\n" +
    '\n' +
    'Because they make up everything!',
  name: undefined,
  additional_kwargs: {}
}
 */

Using ChatGoogleGenerativeAI with the same handler:

const model = new ChatGoogleGenerativeAI({
  modelName: 'gemini-pro',
  callbacks: [handler],
});

model.invoke('Tell me a joke?').then((res) => console.log(res));
/**
 * {
  token: 'Why did the golfer wear two pairs of pants?\n\nIn case he got a hole in one.'
}
AIMessage {
  lc_serializable: true,
  lc_kwargs: {
    content: 'Why did the golfer wear two pairs of pants?\n' +
      '\n' +
      'In case he got a hole in one.',
    name: 'model',
    additional_kwargs: { finishReason: 'STOP', index: 0, safetyRatings: [Array] }
  },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: 'Why did the golfer wear two pairs of pants?\n\nIn case he got a hole in one.',
  name: 'model',
  additional_kwargs: {
    finishReason: 'STOP',
    index: 0,
    safetyRatings: [ [Object], [Object], [Object], [Object] ]
  }
}
 */

Expected Behavior

The handleLLMNewToken method should stream responses in a chunked format for all models, including ChatGoogleGenerativeAI.

The text was updated successfully, but these errors were encountered:

dosubot · 2023-12-29T15:50:11Z

🤖

Hey there, @StreetLamb! Nice to see you back in the langchainjs universe. Your knack for details never ceases to amaze me! 🚀

Thank you for bringing this to our attention. The behavior you're experiencing is due to the fact that the Google Generative AI API, which is used by the ChatGoogleGenerativeAI model, does not support streaming responses in a chunked format. This is why the handleLLMNewToken method does not stream responses in a chunked format for the ChatGoogleGenerativeAI model.

In the ChatGoogleGenerativeAI class, the generateContent method of the GenerativeModel class is used to generate responses. This method returns a single response, not a stream of responses. Here is the relevant code:

const res = await this.caller.callWithOptions(
  { signal: options?.signal },
  async () => {
    let output;
    try {
      output = await this.client.generateContent({
        contents: prompt,
      });
      // eslint-disable-next-line @typescript-eslint/no-explicit-any
    } catch (e: any) {
      // TODO: Improve error handling
      if (e.message?.includes("400 Bad Request")) {
        e.status = 400;
      }
      throw e;
    }
    return output;
  }
);

You can find this in the libs/langchain-google-genai/src/chat_models.ts file.

On the other hand, the ChatOpenAI and ChatAnthropic models might be using APIs that support streaming responses in a chunked format, hence they can use the handleLLMNewToken method to handle each chunk of the response separately.

There was a similar issue in the past, ChatGoogleGenerativeAI: not call callbacks (_runManager?.handleLLMNewToken), which was resolved by a pull request: google-genai[patch]: Hookup callbacks to stream & generate methods. However, this fix was specific to hooking up callbacks to the stream and generate methods, and it did not change the fact that the Google Generative AI API does not support streaming responses in a chunked format.

I hope this clarifies the situation. If you have any further questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

StreetLamb · 2023-12-29T16:10:57Z

@dosu-bot ChatGoogleGenerativeAI model does support streaming via .stream():

const stream = await model.stream('Tell me a joke.');
for await (const chunk of stream) {
  console.log('Chunk received:', chunk);
}
/**
 * {
  token: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his'
}
Chunk received: AIMessageChunk {
  lc_serializable: true,
  lc_kwargs: {
    content: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his',
    name: 'model',
    additional_kwargs: {}
  },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his',
  name: 'model',
  additional_kwargs: {}
}
{ token: ' field!' }
Chunk received: AIMessageChunk {
  lc_serializable: true,
  lc_kwargs: { content: ' field!', name: 'model', additional_kwargs: {} },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: ' field!',
  name: 'model',
  additional_kwargs: {}
}
 */

dosubot · 2023-12-29T16:22:06Z

🤖

Based on the information you've provided and the context from the LangChainJS repository, it seems that the handleLLMNewToken method is not streaming responses in a chunked format for the ChatGoogleGenerativeAI model because the stream method in the ChatGoogleGenerativeAI model does not seem to be implemented to handle chunked responses.

In the provided context, the stream method is called with a callback that appends each new token to a string. However, the response from the stream method is iterated over and each item's content is appended to a response string. This suggests that the stream method is expected to yield responses in a chunked format, but it's not clear from the provided context whether the stream method is actually implemented to do this.

To resolve this, you would need to modify the stream method in the ChatGoogleGenerativeAI model to yield responses in a chunked format. This could be done by making an asynchronous request to the Google Generative AI API and yielding each chunk of the response as it is received. Here's a simplified example of how this could be implemented:

async *stream(messages: HumanMessage[], options?: any) {
  const response = await fetch('https://api.google.com/generativeai', {
    method: 'POST',
    body: JSON.stringify(messages),
  });

  const reader = response.body.getReader();
  let { value: chunk, done: readerDone } = await reader.read();
  while (!readerDone) {
    yield chunk;
    const result = await reader.read();
    chunk = result.value;
    readerDone = result.done;
  }
}

Please note that this is a simplified example and the actual implementation would depend on the specifics of the Google Generative AI API and the LangChainJS framework.

There are also similar issues that have been solved in the LangChainJS repository. For instance, the issue ChatGoogleGenerativeAI: not call callbacks (_runManager?.handleLLMNewToken) was closed by a pull request: google-genai[patch]: Hookup callbacks to stream & generate methods on December 18, 2023. This issue might provide some insights into your problem.

Unfortunately, I wasn't able to find how the handleLLMNewToken method is implemented in the ChatGoogleGenerativeAI model in the LangChainJS framework. If you could provide more information about this, it would be helpful in diagnosing the issue.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Sameera2001Perera · 2024-06-17T08:17:11Z

still the similar issue appears with python module #22802. Can some one look into that plz

dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 29, 2023

StreetLamb mentioned this issue Dec 30, 2023

google-genai: fix streaming via callback handlers for ChatGoogleGenerativeAI #3834

Merged

jacoblee93 closed this as completed in #3834 Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent streaming behaviour with ChatGoogleGenerativeAI via callback handlers #3831

Inconsistent streaming behaviour with ChatGoogleGenerativeAI via callback handlers #3831

StreetLamb commented Dec 29, 2023 •

edited

Loading

dosubot bot commented Dec 29, 2023 •

edited

Loading

StreetLamb commented Dec 29, 2023

dosubot bot commented Dec 29, 2023

Sameera2001Perera commented Jun 17, 2024 •

edited

Loading

Inconsistent streaming behaviour with ChatGoogleGenerativeAI via callback handlers #3831

Inconsistent streaming behaviour with ChatGoogleGenerativeAI via callback handlers #3831

Comments

StreetLamb commented Dec 29, 2023 • edited Loading

Issue Description

Using ChatOpenAI for comparison:

Using ChatGoogleGenerativeAI with the same handler:

Expected Behavior

dosubot bot commented Dec 29, 2023 • edited Loading

Sources

StreetLamb commented Dec 29, 2023

dosubot bot commented Dec 29, 2023

Sources

Sameera2001Perera commented Jun 17, 2024 • edited Loading

StreetLamb commented Dec 29, 2023 •

edited

Loading

dosubot bot commented Dec 29, 2023 •

edited

Loading

Sameera2001Perera commented Jun 17, 2024 •

edited

Loading