-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make LlamaStackLibraryClient work correctly #581
Conversation
T = TypeVar("T") | ||
|
||
|
||
def stream_across_asyncio_run_boundary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a most crucial part of this PR. without this you cannot make the "non-async" generators (which is a necessary mode for our client-sdk) work properly. you must be able to do for chunk in inference.chat_completion()
since that's what synchronous generators are about. to bridge a sync generator to the async generators we have in our server-side code we need to intermediate via a thread pool.
|
||
|
||
def main(config_path: str): | ||
client = LlamaStackAsLibraryClient(config_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://llama-stack.readthedocs.io/en/latest/distributions/importing_as_library.html
does this reference to LlamaStackDirectClient
also needs to be updated?
This PR does a few things:
LlamaStackLibraryClient
In many ways, this PR makes things finally "work"
Test Plan
See a
library_client_test.py
I added. This isn't really quite a test yet but it demonstrates that this mode now works. Here's the invocation and the response: