Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent streaming behaviour with ChatGoogleGenerativeAI via callback handlers #3831

Closed
StreetLamb opened this issue Dec 29, 2023 · 4 comments · Fixed by #3834
Closed
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@StreetLamb
Copy link
Contributor

StreetLamb commented Dec 29, 2023

Issue Description

I am encountering an issue with the ChatGoogleGenerativeAI model where the responses are not streamed as expected when using callback handlers. The expected behaviour, which I observe with ChatOpenAI and ChatAnthropic, is to receive responses in a chunked format. However, with ChatGoogleGenerativeAI, responses are returned as a whole, which deviates from the expected behaviour.

Using ChatOpenAI for comparison:

const handler = BaseCallbackHandler.fromMethods({
  handleLLMNewToken(token) {
    console.log({ token });
  },
});

const openAIModel = new ChatOpenAI({
  streaming: true,
  callbacks: [handler],
});

openAIModel.invoke('Tell me a joke?').then((res) => console.log(res));
/**
 * { token: '' }
{ token: 'Sure' }
{ token: ',' }
{ token: ' here' }
{ token: "'s" }
{ token: ' a' }
{ token: ' l' }
{ token: 'igh' }
{ token: 'the' }
{ token: 'art' }
{ token: 'ed' }
{ token: ' joke' }
{ token: ' for' }
{ token: ' you' }
{ token: ':\n\n' }
{ token: 'Why' }
{ token: ' don' }
{ token: "'t" }
{ token: ' scientists' }
{ token: ' trust' }
{ token: ' atoms' }
{ token: '?\n\n' }
{ token: 'Because' }
{ token: ' they' }
{ token: ' make' }
{ token: ' up' }
{ token: ' everything' }
{ token: '!' }
{ token: '' }
AIMessageChunk {
  lc_serializable: true,
  lc_kwargs: {
    content: "Sure, here's a lighthearted joke for you:\n" +
      '\n' +
      "Why don't scientists trust atoms?\n" +
      '\n' +
      'Because they make up everything!',
    additional_kwargs: {}
  },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: "Sure, here's a lighthearted joke for you:\n" +
    '\n' +
    "Why don't scientists trust atoms?\n" +
    '\n' +
    'Because they make up everything!',
  name: undefined,
  additional_kwargs: {}
}
 */

Using ChatGoogleGenerativeAI with the same handler:

const model = new ChatGoogleGenerativeAI({
  modelName: 'gemini-pro',
  callbacks: [handler],
});

model.invoke('Tell me a joke?').then((res) => console.log(res));
/**
 * {
  token: 'Why did the golfer wear two pairs of pants?\n\nIn case he got a hole in one.'
}
AIMessage {
  lc_serializable: true,
  lc_kwargs: {
    content: 'Why did the golfer wear two pairs of pants?\n' +
      '\n' +
      'In case he got a hole in one.',
    name: 'model',
    additional_kwargs: { finishReason: 'STOP', index: 0, safetyRatings: [Array] }
  },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: 'Why did the golfer wear two pairs of pants?\n\nIn case he got a hole in one.',
  name: 'model',
  additional_kwargs: {
    finishReason: 'STOP',
    index: 0,
    safetyRatings: [ [Object], [Object], [Object], [Object] ]
  }
}
 */

Expected Behavior

The handleLLMNewToken method should stream responses in a chunked format for all models, including ChatGoogleGenerativeAI.

Copy link

dosubot bot commented Dec 29, 2023

🤖

Hey there, @StreetLamb! Nice to see you back in the langchainjs universe. Your knack for details never ceases to amaze me! 🚀

Thank you for bringing this to our attention. The behavior you're experiencing is due to the fact that the Google Generative AI API, which is used by the ChatGoogleGenerativeAI model, does not support streaming responses in a chunked format. This is why the handleLLMNewToken method does not stream responses in a chunked format for the ChatGoogleGenerativeAI model.

In the ChatGoogleGenerativeAI class, the generateContent method of the GenerativeModel class is used to generate responses. This method returns a single response, not a stream of responses. Here is the relevant code:

const res = await this.caller.callWithOptions(
  { signal: options?.signal },
  async () => {
    let output;
    try {
      output = await this.client.generateContent({
        contents: prompt,
      });
      // eslint-disable-next-line @typescript-eslint/no-explicit-any
    } catch (e: any) {
      // TODO: Improve error handling
      if (e.message?.includes("400 Bad Request")) {
        e.status = 400;
      }
      throw e;
    }
    return output;
  }
);

You can find this in the libs/langchain-google-genai/src/chat_models.ts file.

On the other hand, the ChatOpenAI and ChatAnthropic models might be using APIs that support streaming responses in a chunked format, hence they can use the handleLLMNewToken method to handle each chunk of the response separately.

There was a similar issue in the past, ChatGoogleGenerativeAI: not call callbacks (_runManager?.handleLLMNewToken), which was resolved by a pull request: google-genai[patch]: Hookup callbacks to stream & generate methods. However, this fix was specific to hooking up callbacks to the stream and generate methods, and it did not change the fact that the Google Generative AI API does not support streaming responses in a chunked format.

I hope this clarifies the situation. If you have any further questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 29, 2023
@StreetLamb
Copy link
Contributor Author

@dosu-bot ChatGoogleGenerativeAI model does support streaming via .stream():

const stream = await model.stream('Tell me a joke.');
for await (const chunk of stream) {
  console.log('Chunk received:', chunk);
}
/**
 * {
  token: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his'
}
Chunk received: AIMessageChunk {
  lc_serializable: true,
  lc_kwargs: {
    content: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his',
    name: 'model',
    additional_kwargs: {}
  },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his',
  name: 'model',
  additional_kwargs: {}
}
{ token: ' field!' }
Chunk received: AIMessageChunk {
  lc_serializable: true,
  lc_kwargs: { content: ' field!', name: 'model', additional_kwargs: {} },
  lc_namespace: [ 'langchain_core', 'messages' ],
  content: ' field!',
  name: 'model',
  additional_kwargs: {}
}
 */

Copy link

dosubot bot commented Dec 29, 2023

🤖

Based on the information you've provided and the context from the LangChainJS repository, it seems that the handleLLMNewToken method is not streaming responses in a chunked format for the ChatGoogleGenerativeAI model because the stream method in the ChatGoogleGenerativeAI model does not seem to be implemented to handle chunked responses.

In the provided context, the stream method is called with a callback that appends each new token to a string. However, the response from the stream method is iterated over and each item's content is appended to a response string. This suggests that the stream method is expected to yield responses in a chunked format, but it's not clear from the provided context whether the stream method is actually implemented to do this.

To resolve this, you would need to modify the stream method in the ChatGoogleGenerativeAI model to yield responses in a chunked format. This could be done by making an asynchronous request to the Google Generative AI API and yielding each chunk of the response as it is received. Here's a simplified example of how this could be implemented:

async *stream(messages: HumanMessage[], options?: any) {
  const response = await fetch('https://api.google.com/generativeai', {
    method: 'POST',
    body: JSON.stringify(messages),
  });

  const reader = response.body.getReader();
  let { value: chunk, done: readerDone } = await reader.read();
  while (!readerDone) {
    yield chunk;
    const result = await reader.read();
    chunk = result.value;
    readerDone = result.done;
  }
}

Please note that this is a simplified example and the actual implementation would depend on the specifics of the Google Generative AI API and the LangChainJS framework.

There are also similar issues that have been solved in the LangChainJS repository. For instance, the issue ChatGoogleGenerativeAI: not call callbacks (_runManager?.handleLLMNewToken) was closed by a pull request: google-genai[patch]: Hookup callbacks to stream & generate methods on December 18, 2023. This issue might provide some insights into your problem.

Unfortunately, I wasn't able to find how the handleLLMNewToken method is implemented in the ChatGoogleGenerativeAI model in the LangChainJS framework. If you could provide more information about this, it would be helpful in diagnosing the issue.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@Sameera2001Perera
Copy link

Sameera2001Perera commented Jun 17, 2024

still the similar issue appears with python module #22802. Can some one look into that plz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants