How to use stream to return a message even instead of waiting until everything has been processed #115

pslxx · 2024-05-07T09:05:32Z

How to use stream to return a message even instead of waiting until everything has been processed

MaximeThoonsen · 2024-05-07T09:53:25Z

hey @pslxx , there are dedicated method in the ChatInterface like generateStreamOfText.
Does it help you?

messi89 · 2024-08-22T12:22:02Z

@MaximeThoonsen the generateStreamOfText function use guzzle request (not async request), so theoretically the method waits for the whole response from the ollama api.

Is there any way to read streamed stream from ollama response directly ?

iztok · 2024-09-01T08:34:12Z

+1 Looking to iterate through each stream chunk but stream methods return StreamInterface that doesn't allow this (#78 (comment))

iztok · 2024-09-02T21:07:58Z

If anyone finds this helpful:

        $streamToIterator = function (StreamInterface $stream): Generator {
            while (!$stream->eof()) {
                yield $stream->read(32); // Adjust the chunk size as needed
            }
        };
        $iteratorStream = $streamToIterator($stream);

        foreach ($iteratorStream as $chunk) {
            // chunks are not token based anymore!
        }

MaximeThoonsen · 2024-09-04T11:20:21Z

hello @ezimuel, how are you?

It seems there is a lot of questions around streaming. Can we still do streaming with StreamInterface and LLPhant? What is the "clean/simple" working example?

@iztok the code your provided is working for you to get a stream?

iztok · 2024-09-04T11:27:30Z

@iztok the code your provided is working for you to get a stream?

Yes, this returns an iterateable stream I can use the same as I used the stream from the OpenAI library. One caveat is that this stream's chunks are not tokens but strings of size 32 bytes. I'm then broadcasting these chunks over the WebSocket to my chat clients.

prykris · 2024-09-11T15:20:10Z

Yes, this returns an iterateable stream I can use the same as I used the stream from the OpenAI library. One caveat is that this stream's chunks are not tokens but strings of size 32 bytes. I'm then broadcasting these chunks over the WebSocket to my chat clients.

@iztok I see how that is a caveat. Does that make any difference in your use case or does it seriously impact the end-user experience?

I am trying to understand the pitfalls I might run into while trying to implement something similar.

ryanisn · 2024-10-28T02:59:35Z

with ollama, $stream->read(32) waits on the whole response completed before returning the "chunk"

mortensi · 2024-10-28T21:40:34Z

Thanks for the hints. However, I must be doing something wrong. This does not work for me and I get all the answer at once.

       $qa = new QuestionAnswering(
           $redisVectorStore,
           $embeddingGenerator,
           new OpenAIChat($config)
       );

        $stream = $qa->answerQuestionStream($request->input('q'));

        $streamToIterator = function (StreamInterface $stream): \Generator {
            while (!$stream->eof()) {
                yield $stream->read(32); 
            }
        };
        $iteratorStream = $streamToIterator($stream);

        foreach ($iteratorStream as $chunk) {
            echo $chunk;
        }

Using OpenAI, partial reads work perfectly, though.

       $responseStream = OpenAI::chat()->createStreamed([
           'model' => 'gpt-3.5-turbo',
           'messages' => [
               ['role' => 'user', 'content' => $request->input('q')],
           ],
       ]);

       foreach ($responseStream as $delta) {
           // Stream the delta content back to the client
           echo $delta['choices'][0]['delta']['content'] ?? '';
           ob_flush(); 
           flush();    
       }

       ob_flush();
       flush();

I'm not sure what else I can test to make answerQuestionStream work.

mortensi · 2024-10-29T18:55:04Z

Solved. I needed ob_flush() and flush() after each chunk to force Laravel send the output.

        $streamToIterator = function (StreamInterface $stream): \Generator {
            while (!$stream->eof()) {
                yield $stream->read(32); 
                ob_flush();
                flush();
            }
        };

        $iteratorStream = $streamToIterator($stream);

        foreach ($iteratorStream as $chunk) {
            echo $chunk;
        }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use stream to return a message even instead of waiting until everything has been processed #115

How to use stream to return a message even instead of waiting until everything has been processed #115

pslxx commented May 7, 2024

MaximeThoonsen commented May 7, 2024

messi89 commented Aug 22, 2024

iztok commented Sep 1, 2024

iztok commented Sep 2, 2024

MaximeThoonsen commented Sep 4, 2024

iztok commented Sep 4, 2024

prykris commented Sep 11, 2024

ryanisn commented Oct 28, 2024

mortensi commented Oct 28, 2024

mortensi commented Oct 29, 2024

How to use stream to return a message even instead of waiting until everything has been processed #115

How to use stream to return a message even instead of waiting until everything has been processed #115

Comments

pslxx commented May 7, 2024

MaximeThoonsen commented May 7, 2024

messi89 commented Aug 22, 2024

iztok commented Sep 1, 2024

iztok commented Sep 2, 2024

MaximeThoonsen commented Sep 4, 2024

iztok commented Sep 4, 2024

prykris commented Sep 11, 2024

ryanisn commented Oct 28, 2024

mortensi commented Oct 28, 2024

mortensi commented Oct 29, 2024