-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix incorrect impl of flow (streaming gen.) for local LLMs #381
Conversation
gpt4all-kotlin/src/jvmMain/kotlin/com/xebia/functional/gpt4all/GPT4All.kt
Outdated
Show resolved
Hide resolved
val message = assistant(buffer.toString()) | ||
.map { it.choices.mapNotNull { it.delta?.content }.reduce(String::plus) } | ||
.onEach { emit(it) } | ||
.fold("", String::plus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to achieve the same as the onEach
and fold
? Currently, each item is emitted to another outer flow.
This is required because we have to fold
the flow and get the full generated string in order to add it to the memory. On the other hand, we want to pass each item forward to the caller of the function so that they can do whatever the want to do with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It kind-of depends on the desired behavior of the resulting Flow
.
onEach
will back-pressure the fold
, so the resulting Flow
can still back-pressure the fold
which is desired.
onEach
is the same of map { f(it); it }
.
My concern is that fold
will take everything in memory, but I guess that's inherit to MemoryManagement.addMemoriesAfterStream
. Just going by the name it sounds like it puts everything in memory anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the goal is indeed to get the full string of generated text and store in the conversation history. At the same time I want to forward every individual item to the outer flow.
I came to the conclusion there is no better way.
If you have time, can you explain what you mean by the terminology of back-pressure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Intex32 I explain back-pressure a little bit here. https://youtu.be/n724AdSAkOI
Let me know if this clears it up for you, if not we can discuss it further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nomisRev I have more questions than before 😂. I especially liked the part where you talked about back pressure.
The pen has some classy dance moves tbh.
@nomisRev Could you perhaps have a look? in particular in respect to my comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @Intex32! Great work 🙌
When playing with local models, I noticed that the generation (returning a flow) is not printed in real time to the console. After investigation, I figured out this was because of the
coroutineScope
in theGpt4All
file that waits until all child jobs are done.I am not very fluent with the
Flow
API, still I believe that I improved the code rather than worsen. Feedback is appreciated. Also consider my comments.