Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get unbuffered responses? #4

Closed
alph4b3th opened this issue Apr 12, 2023 · 5 comments
Closed

how to get unbuffered responses? #4

alph4b3th opened this issue Apr 12, 2023 · 5 comments

Comments

@alph4b3th
Copy link

I noticed that .predict returns a complete string, which is the model's response. However, I need to give the user a feeling of iteration in which the model must send what it is "typing" in time to the user's client. But the .predict function only returns post-finished responses. How can I get an answer as predicted by the model? A feeling of "typing"

@mudler mudler changed the title how to get buffered responses? how to get unbuffered responses? Apr 12, 2023
@mudler
Copy link
Member

mudler commented Apr 12, 2023

This is not implemented yet - however it's something I'm interested in supporting.

@mudler
Copy link
Member

mudler commented Apr 12, 2023

one way would be to restrict the binding to send back data directly to a golang channel, for instance: https://github.com/matiasinsaurralde/cgo-channels/tree/master however, I see still that could incur in a huge penalty, as context switch from golang and C in a loop have a high computational cost.

I think we could offer a low-level functionality to address the specific case and scope it to have just a few functions exposed, but wouldn't suggest usage when performance is a requirement.

@alph4b3th
Copy link
Author

one way would be to restrict the binding to send back data directly to a golang channel, for instance: https://github.com/matiasinsaurralde/cgo-channels/tree/master however, I see still that could incur in a huge penalty, as context switch from golang and C in a loop have a high computational cost.

I think we could offer a low-level functionality to address the specific case and scope it to have just a few functions exposed, but wouldn't suggest usage when performance is a requirement.

can you create this functionality?

@noxer
Copy link

noxer commented Apr 27, 2023

To answer the original question:

llama.SetTokenCallback(func(token string) bool {
    fmt.Print(token)
    return true // we want the predictor to continue
})

@mudler
Copy link
Member

mudler commented Apr 28, 2023

I think we can close this now, thanks @noxer ❤️!

@mudler mudler closed this as completed Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants