-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Agent framework #1161
Comments
#1161 and #1150 represent rather different philosophies in terms of integrating GenAI into OpenSearch. #1161 seeks to provide a framework for building conversational apps that happen to use some OpenSearch features. #1150 seeks to provide a conversational interface over your favorite search engine. Both are valid. Neither should be core OpenSearch features, and furthermore, I think that neither belong in ML-Commons. ML-Commons is for training and predicting with ML models, while these RFCs are for building Generative AI applications. Accordingly, I’d like to call for the creation of an ‘AI-Commons’ plugin as an extension to ML-Commons. #1150 and #1161 will look pretty similar, code-wise, so I imagine it should be pretty easy to share a codebase. Both need conversational memory of some form; both need prompt templating of some form. Why do we want both? I imagine developers picking AI-Commons up for the RAG of #1150 - this will provide a good starting point for people looking to spice up their existing search application with some GenAI pizzazz. In many use-cases, this will be sufficient. But gradually, these conversational search apps will acquire peculiarities and requirements that the RAG pipeline might not support. Then #1161’s CoT will be required, and these apps will cross a line where they stop being fancy search apps and start being fancy AI apps. Therefore it should be easy to go from RAG to CoT - RAG should be in the CoT ecosystem, but should also be able to stand alone. As an example, what does answering the question “What happens if a ship jackknifes in the Suez Canal?” entail? RAG will try to answer in a single query (granted, with some potentially clever query rewriting), but unless there’s a document detailing the answer to this question, RAG is hopeless. CoT, however, will ask a series of queries, one step at a time, to build up and derive an answer. For example - “What are the major trade routes through the Suez Canal?”, “What are the shipping routes from Oman to America?”, “How long are they?”, “What products does this particular ship carry?”, “What is the demand and backlog of this particular product?”, etc. Great! Well, if CoT is so powerful, why are we bothering with RAG? A couple of reasons. 1/ RAG is much simpler. Personally, I prefer using predictable tools that I understand. I know exactly what RAG is going to do: query OpenSearch, then pipe the results into an LLM for synthesis. I don’t know what a CoT agent is going to do, given an arbitrary query. That’s what makes it so powerful - that’s gets to choose how to answer - but I don’t quite trust it to do the right thing. And trust is everything when it comes to GenAI adoption. So if we let RAG build up trust in the system, then people will be more comfortable switching to CoT. 2/ RAG is closer to search. OpenSearch users want to do search. Throwing a whole CoT GenAI infrastructure at someone who just wants to do search is going to alienate them. But a GenAI interface (RAG) over their search, maybe that will be easier to stomach. Finally, 3/ RAG is probably cheaper, cost- and performance-wise - only one or two LLM inferences instead of several for every query. So both #1150 and #1161 should happen, imo, and in the same place. How can we combine our efforts? The absolute first thing is that we all need to be aware of what code already exists. I’ve published some code, and I would urge everyone else here to do the same. We don’t all work together, so if we want to work together, we need to be able to see each other’s work. As far as integrating the RFCs into one project - first I’m gonna vote to separate agents from models (the first option from #1161). Then my proposed plan:
In general with CoT, I think we don’t want to give the LLM too many options. We should try to keep as much complexity within the tools as possible, and focus on giving them clean and intuitive interfaces - LLMs are basically pure intuition. I hope this plan is agreeable to people. p.s. can we resolve #1151? It looks like #1161 and #1150 partition it. |
Thanks @HenryL27 for the detailed analysis and explanation. We are going to resolve #1151. So we have two options for how to organize the code: Both have some pros and cons. For #1161, we could leverage current ml-commons ML framework and it will be less effort for both ml-commons and ml-commons clients(they don't need to move to a new AI-Commons). And this also matches the long-term roadmap: use ml-commons as the commons layer for ml/AI framework. Train/predict API is not the whole thing for this layer. I think for #1150 , its scope is clear and well defined. Similar to #1151, I think people all agree a conversation plugin is necessary and the conversational search is important feature. I would like to keep the scope as is. That could make the discussion and design clean and easier. Agree to your analysis for "if CoT is so powerful, why are we bothering with RAG" part. CoT is more general, but I totally agree RAG is necessary. Let's don't combine these two things together too early. That will make the scope much bigger and complicated. Let's discuss the well-scoped RFC separately first, once these separate RFC are aligned, we can continue next step about check if it's possible to leverage some common thing for example add another option: build RAG Agent. But I'm also fine if you think we should list all these possible options and discuss them together first. But please list all options clearly and simply (make the scope clear), and analyze the pros and cons. |
Okay, here are some options I've come up with. Note that pros and cons are subjective, so I'm interested in your opinions regarding these. Also, this is by no means a comprehensive list, and I'll bet a lot of the listed pros and cons apply to other options as well. Option 1: AI-Commons
Cons:
Option 2: Separate RAG; CoT in ML-Commons
Cons:
Option 3: Separate RAG, Separate CoT
Cons:
Option 4: Everything in ML-Commons
Cons:
Option 5: Separate CoT, RAG in ML-Commons
|
@ylwu-amzn "And this also matches the long-term roadmap: use ml-commons as the commons layer for ml/AI framework. Train/predict API is not the whole thing for this layer." Can you pls share where this roadmap and the RFC/doc outlining the strategy behind it? Would also love to see the community discussion around the pros/cons, and if not, perhaps now would be a good time for folks to weigh in. |
Sorry that our doc is not so up to date, we need to fine tune the readme doc to reflect the latest change like remote model. But you can see "Machine Learning Commons for OpenSearch is a new solution that make it easy to develop new machine learning feature." in our current readme. We meant to use ml-commons as a common layer to make it easy for building any ML (also AI actually) application/feature. neural-search plugin(doc link) is a good example for this. It depends on ml-commons client to build semantic search feature. |
@HenryL27 Thanks for the quick response. I think the main point is how to organize the code, different way will be a new option. Like you mentioned, separate RAG/COT , or build in ml-commons or new AI-commons. Like I replied in last comment, ml-commons will be a common layer for ML/AI things which provides frameworks/easy-to-use APIs like train/predict , resource management like managing models, routes requests etc. This can reduce the fragmentation. Reduce the common framework thing into another "commons" repo will make it harder to maintain. You can check neural-search plugin, it's a ML/AI feature built on top of ml-commons, just add ml-commons jar client and call predict API. I would suggest use the similar way to build RAG feature: create a new RAG(or any other better name) repo, then leverage ml-commons java client jar to invoke model or Agent. Well decoupled also keep the common thing like ML framework and Agent things in one common place. Edit: another option is adding RAG feature to neural-search plugin |
Haven't gone through the whole thread yet but wanted to drop this in for discussion/consideration. Haystack has a PromptHub framework for fetching, updating, and using prompts. https://haystack.deepset.ai/blog/share-and-use-prompt-with-prompthub |
@dtaivpp Thanks for sharing this, I think we should consider integrating with haystack or other prmopthub. |
Hey @ylwu-amzn, it's been a bit since I've seen anything reference this issue. I'm wondering where we are in the development required to get the framework implemented. Is there anything we can do to help? |
hi, @HenryL27 , We already have one PoC. Will publish soon. |
On the note of conversational memory; it was brought up in today's community meeting that we should have hooks for ISM that allow the chat history to be deleted after a certain period of time. At the moment I believe we only support deleting indexes based on their creation time but for this depending on how the conversational memory is implemented will need to delete individual documents based on insert time. |
Closing this issue as Agent framework is delivered as GA in 2.13.0 |
In 2.9, ml-commons support remote inference with connector. This issue designs how to build a general Agent framework by leveraging remote LLM.
Why we need to build Agent?
To solve a complex problem, the process generally is hard to be predefined. We need to find some way to solve the problem step by step, identify potential solutions to reach a resolution.
Architecture
Components
Model
ml-commons released remote inference feature in 2.9. We can create remote model with LLM connector. For example create a remote model with OpenAI Chat model.
Prompt Template
Define prompt template for LLM.
User can refer to the prompt template with prompt id in “prompt repo” or “prompt index”.
Agent
Agent is a coordinator which uses LLM to reason what action to take to solve problem, then coordinate action execution. The action execution sequence could be not predefined/hard-coded. We plan to make the framework flexible to support multiple Agent types (like flow, CoT etc). But for the first phase, we can start from supporting conversational ReAct Agent.
Tools
Tool is a function which can be executed by Agent. We will define a
Tool
framework in ml-commons. We will build some general built-in tools, for exampleOpenSearchIndexTool
which will gather OpenSearch index information like index statusVectorDBTool
which will support running vector searchSeachIndexTool
which support search OpenSearch indexThe
Tool
framework will be extensible. Other plugins can build their own Tools by implementingTool
interface. For example AnomalyDetection plugin can buildAnomalyResultTool
which can query anomaly detection result and analyze, and build "AnomalyDetectorTool" to create anomaly detector. Some use cases to use these toolAnomalyResultTool
to query anomaly result index about how many anomalies detected in last 24 hours.AnomalyDetectorTool
to create anomaly detector for this index, then useAlertingMonitorTool
create a anomaly monitor with detector and user's email.ml-commons will be the central place to manage all of these tools. It will provide tool management functions:
Possible design:
Tool could be very general, for example
User can provide parameters to customize their own tools. For example for
VectorDBTool
, user can customize theembedding_model_id
,knn_idnex
etc.Toolkit
A set of tools which can work together to complete some target task. For example, Anomaly Detection (AD) plugin can build multiple tools like
AnomalyResultTool
,AnomalyDetectorTool
,HistoricalAnalysisTool
etc. Then AD can create a TookitAnomalyToolkit
for all of these anomaly related tools. So user can just configureAnomalyToolkit
in Agent, will automatically add all of these AD tools to Agent.Memory
A general memory layer for storing the historical interactions. For example, the chat/conversational agent needs to save session history and continue the session later.
Design
Option1: add new concept: Agent
User needs to create a remote model for LLM first. Then user can use the model id and tools to create a new Agent.
The workflow will be:
We will provide get, search and update Agent APIs.
User don’t need to deploy Agent, they can run an Agent directly if the model is deployed.
Pros:
Cons:
Two options for agent access control:
Option2: Extend model by adding agent field
We extend current model by adding
Tools
orAgent
fields. Such model will be a CoT model.A CoT model must be some LLM which supports generative AI, for example OpenAI ChatGPT model, Anthropic Claude.
The workflow will be
Pros:
Cons:
The text was updated successfully, but these errors were encountered: