-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a streaming RAG method #967
Conversation
de92509
to
f2519df
Compare
0704c99
to
8f53c24
Compare
@scotttrinh what do you think about an intermediate object to decouple input/output? await queryRag(message, context).text()
await queryRag(message, context).response()
for await (part of queryRag(message, context)) { ... } |
I'm not sure I understand the suggestion here 🤔 |
Instead of 3 top level query functions to vary the output shape, just have 1. Then the output shape is picked independently. |
Ahh, I understand now. Seems reasonable so we'll have to answer the question of which is better on some other axes:
Some of those questions are research for me, others will require a bit of thinking and trying to intuit an answer. Feel free to state the case for a single function that returns an object that exposes the different interfaces vs. separate functions. |
As far as # 1 goes, fetch has the same API await fetch(...).json()
await fetch(...).arrayBuffer() I don't see any problems here. I can't speak to 2 & 6. As far as 3 & 4 both the inputs & outputs have to work together in implementation obviously. But this decoupling allows modifying args or output methods without needing to adjust multiple signatures. Giving the I did a first pass here stream-ai...CarsonF:edgedb-js:stream-ai |
Yeah, and while I agree that works as a primitive since it allows you to have one function with lots of different behavior, I think some people find that API annoying as an end-user API. Great for a primitive, though, and maybe that's really what we're building here.
Yeah, I think that's a compelling argument! Somehow that makes me feel even more like this is a primitive on which higher-level functionality should be built (like our soon-to-come Vercel AI SDK integration) and therefore we should optimize for a flexible low-level primitive. Lemme shop the idea around a bit more, thanks for taking the time to make an example implementation, that makes it nice and concrete for discussion. 🙏 |
Instead of using the async generator and converting the text back into a byte stream, just pipe the response from the RAG request directly to the response in `streamRag`.
Ensures that the chunks are properly parsed and emitted as they come in, splitting multiple events that arrive together in one chunk, and combining partial events from multiple chunks into a single emitted value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! Think this looks pretty good. One small comment.
packages/ai/src/core.ts
Outdated
const res = await this.fetchRag({ | ||
model: this.model, | ||
prompt: this.prompt, | ||
context: this.context, | ||
query: this.query, | ||
stream: true, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we always pass this.model
, etc, I think we can simplify fetchRag
to just take stream: boolean
.
OK, so @diksipav and I have been going back and forth a bunch about the API here, and I think I have some more concrete thoughts on
|
Maybe the python library should change too 😛 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, and thanks for throwing in the embeddings method as a bonus here.
packages/ai/README.md
Outdated
- `streamRag(message: string, context?: QueryContext): AsyncIterable<StreamingMessage> & PromiseLike<Response>` | ||
|
||
Used both when you want to handle streaming data as it arrives or when you need a `Response` object that streams the results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth documenting these separately, almost like two overloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey I updated Readme and merged this, if you have time u can check the updated Readme.
packages/ai/src/core.ts
Outdated
if (!response.headers.get("content-type")?.includes("application/json")) { | ||
throw new Error( | ||
"expected response to have content-type: application/json", | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we take this out because we just want to throw the existing exceptions that the json
method returns?
Adds an SSE-style streaming response method along with a lower-level async generator.
Example output from httpie: