Rewrite AutoGen Studio Database Layer to Use SQLModel ORM (#2425)

* update orm branch + accesibility tweaks * general file location refactor * add support for LocalCommandLineCodeExecutor and DockerCommandLineCodeExecutor * update code execution config laoding * version update * bump version rc1 * add model type selection (openai , gemini, azure) * add ability to test workflow * psycopg3 support * add close logic to build tab pop ups, enable testing of workflows in build view * updates to dbmanager, version bump * add max_tokens default value * ensure sessions are used correctly in dbmanager * initial support for migrations * update sessions/workflow api routing for clarity. * general refactor, + add support for initial sample workflows * orm branch updates * Removed incorrect Git LFS files * update git lfs tracking --------- Co-authored-by: Audel Rouhi <knucklessg1@gmail.com>
microsoft · May 11, 2024 · d50f654 · d50f654
1 parent 60c6658
commit d50f654
Show file tree

Hide file tree

Showing 39 changed files with 5,044 additions and 3,937 deletions.
diff --git a/samples/apps/autogen-studio/.gitignore b/samples/apps/autogen-studio/.gitignore
@@ -1,6 +1,7 @@
 database.sqlite
 .cache/*
 autogenstudio/web/files/user/*
+autogenstudio/test
 autogenstudio/web/files/ui/*
 OAI_CONFIG_LIST
 scratch/

diff --git a/samples/apps/autogen-studio/README.md b/samples/apps/autogen-studio/README.md
@@ -15,6 +15,8 @@ Code for AutoGen Studio is on GitHub at [microsoft/autogen](https://github.com/m
 > AutoGen Studio is currently under active development and we are iterating quickly. Kindly consider that we may introduce breaking changes in the releases during the upcoming weeks, and also the `README` might be outdated. We'll update the `README` as soon as we stabilize the API.
 
 > [!NOTE] Updates
+> April 17: AutoGen Studio database layer is now rewritten to use [SQLModel](https://sqlmodel.tiangolo.com/) (Pydantic + SQLAlchemy). This provides entity linking (skills, models, agents and workflows are linked via association tables) and supports multiple [database backend dialects](https://docs.sqlalchemy.org/en/20/dialects/) supported in SQLAlchemy (SQLite, PostgreSQL, MySQL, Oracle, Microsoft SQL Server). The backend database can be specified a `--database-uri` argument when running the application. For example, `autogenstudio ui --database-uri sqlite:///database.sqlite` for SQLite and `autogenstudio ui --database-uri postgresql+psycopg://user:password@localhost/dbname` for PostgreSQL.
+
 > March 12: Default directory for AutoGen Studio is now /home/<user>/.autogenstudio. You can also specify this directory using the `--appdir` argument when running the application. For example, `autogenstudio ui --appdir /path/to/folder`. This will store the database and other files in the specified directory e.g. `/path/to/folder/database.sqlite`. `.env` files in that directory will be used to set environment variables for the app.
 
 ### Capabilities / Roadmap
@@ -84,7 +86,14 @@ autogenstudio ui --port 8081
 ```
 
 This will start the application on the specified port. Open your web browser and go to `http://localhost:8081/` to begin using AutoGen Studio.
-AutoGen Studio also takes a `--host <host>` argument to specify the host address. By default, it is set to `localhost`. You can also use the `--appdir <appdir>` argument to specify the directory where the app files (e.g., database and generated user files) are stored. By default, it is set to the directory where autogen pip package is installed.
+
+AutoGen Studio also takes several parameters to customize the application:
+
+- `--host <host>` argument to specify the host address. By default, it is set to `localhost`. Y
+- `--appdir <appdir>` argument to specify the directory where the app files (e.g., database and generated user files) are stored. By default, it is set to the a `.autogenstudio` directory in the user's home directory.
+- `--port <port>` argument to specify the port number. By default, it is set to `8080`.
+- `--reload` argument to enable auto-reloading of the server when changes are made to the code. By default, it is set to `False`.
+- `--database-uri` argument to specify the database URI. Example values include `sqlite:///database.sqlite` for SQLite and `postgresql+psycopg://user:password@localhost/dbname` for PostgreSQL. If this is not specified, the database URIL defaults to a `database.sqlite` file in the `--appdir` directory.
 
 Now that you have AutoGen Studio installed and running, you are ready to explore its capabilities, including defining and modifying agent workflows, interacting with agents and sessions, and expanding agent skills.
 
@@ -98,8 +107,6 @@ AutoGen Studio proposes some high-level concepts.
 
 **Skills**: Skills are functions (e.g., Python functions) that describe how to solve a task. In general, a good skill has a descriptive name (e.g. `generate_images`), extensive docstrings and good defaults (e.g., writing out files to disk for persistence and reuse). You can add new skills AutoGen Studio app via the provided UI. At inference time, these skills are made available to the assistant agent as they address your tasks.
 
-AutoGen Studio comes with 3 example skills: `fetch_profile`, `find_papers`, `generate_images`. The default skills, agents and workflows are based on the [dbdefaults.json](autogentstudio/utils/dbdefaults.json) file which is used to initialize the database.
-
 ## Example Usage
 
 Consider the following query.
@@ -116,8 +123,6 @@ The agent workflow responds by _writing and executing code_ to create a python p
 
 > Note: You can also view the debug console that generates useful information to see how the agents are interacting in the background.
 
-<!-- ![ARA](./docs/ara_console.png) -->
-
 ## Contribution Guide
 
 We welcome contributions to AutoGen Studio. We recommend the following general steps to contribute to the project:
@@ -134,7 +139,7 @@ We welcome contributions to AutoGen Studio. We recommend the following general s
 
 **Q: How do I specify the directory where files(e.g. database) are stored?**
 
-A: You can specify the directory where files are stored by setting the `--appdir` argument when running the application. For example, `autogenstudio ui --appdir /path/to/folder`. This will store the database and other files in the specified directory e.g. `/path/to/folder/database.sqlite`.
+A: You can specify the directory where files are stored by setting the `--appdir` argument when running the application. For example, `autogenstudio ui --appdir /path/to/folder`. This will store the database (default) and other files in the specified directory e.g. `/path/to/folder/database.sqlite`.
 
 **Q: Where can I adjust the default skills, agent and workflow configurations?**
 A: You can modify agent configurations directly from the UI or by editing the [dbdefaults.json](autogenstudio/utils/dbdefaults.json) file which is used to initialize the database.
@@ -146,7 +151,7 @@ A: To reset your conversation history, you can delete the `database.sqlite` file
 A: Yes, you can view the generated messages in the debug console of the web UI, providing insights into the agent interactions. Alternatively, you can inspect the `database.sqlite` file for a comprehensive record of messages.
 
 **Q: Can I use other models with AutoGen Studio?**
-Yes. AutoGen standardizes on the openai model api format, and you can use any api server that offers an openai compliant endpoint. In the AutoGen Studio UI, each agent has an `llm_config` field where you can input your model endpoint details including `model`, `api key`, `base url`, `model type` and `api version`. For Azure OpenAI models, you can find these details in the Azure portal. Note that for Azure OpenAI, the `model` is the deployment name or deployment id, and the `type` is "azure".
+Yes. AutoGen standardizes on the openai model api format, and you can use any api server that offers an openai compliant endpoint. In the AutoGen Studio UI, each agent has an `llm_config` field where you can input your model endpoint details including `model`, `api key`, `base url`, `model type` and `api version`. For Azure OpenAI models, you can find these details in the Azure portal. Note that for Azure OpenAI, the `model name` is the deployment id or engine, and the `model type` is "azure".
 For other OSS models, we recommend using a server such as vllm to instantiate an openai compliant endpoint.
 
 **Q: The server starts but I can't access the UI**

diff --git a/samples/apps/autogen-studio/autogenstudio/chatmanager.py b/samples/apps/autogen-studio/autogenstudio/chatmanager.py
@@ -4,14 +4,18 @@
 import time
 from datetime import datetime
 from queue import Queue
-from typing import Any, Dict, List, Optional, Tuple
+from typing import Any, Dict, List, Optional, Tuple, Union
 
 import websockets
 from fastapi import WebSocket, WebSocketDisconnect
 
-from .datamodel import AgentWorkFlowConfig, Message, SocketMessage
-from .utils import extract_successful_code_blocks, get_modified_files, summarize_chat_history
-from .workflowmanager import AutoGenWorkFlowManager
+from .datamodel import Message, SocketMessage, Workflow
+from .utils import (
+    extract_successful_code_blocks,
+    get_modified_files,
+    summarize_chat_history,
+)
+from .workflowmanager import WorkflowManager
 
 
 class AutoGenChatManager:
@@ -41,7 +45,7 @@ def chat(
         self,
         message: Message,
         history: List[Dict[str, Any]],
-        flow_config: Optional[AgentWorkFlowConfig] = None,
+        workflow: Any = None,
         connection_id: Optional[str] = None,
         user_dir: Optional[str] = None,
         **kwargs,
@@ -59,78 +63,93 @@ def chat(
         """
 
         # create a working director for workflow based on user_dir/session_id/time_hash
-        work_dir = os.path.join(user_dir, message.session_id, datetime.now().strftime("%Y%m%d_%H-%M-%S"))
+        work_dir = os.path.join(
+            user_dir,
+            str(message.session_id),
+            datetime.now().strftime("%Y%m%d_%H-%M-%S"),
+        )
         os.makedirs(work_dir, exist_ok=True)
 
         # if no flow config is provided, use the default
-        if flow_config is None:
-            raise ValueError("flow_config must be specified")
+        if workflow is None:
+            raise ValueError("Workflow must be specified")
 
-        flow = AutoGenWorkFlowManager(
-            config=flow_config,
+        workflow_manager = WorkflowManager(
+            workflow=workflow,
             history=history,
             work_dir=work_dir,
             send_message_function=self.send,
             connection_id=connection_id,
         )
 
+        workflow = Workflow.model_validate(workflow)
+
         message_text = message.content.strip()
 
         start_time = time.time()
-        flow.run(message=f"{message_text}", clear_history=False)
+        workflow_manager.run(message=f"{message_text}", clear_history=False)
         end_time = time.time()
 
         metadata = {
-            "messages": flow.agent_history,
-            "summary_method": flow_config.summary_method,
+            "messages": workflow_manager.agent_history,
+            "summary_method": workflow.summary_method,
             "time": end_time - start_time,
             "files": get_modified_files(start_time, end_time, source_dir=work_dir),
         }
 
-        print("Modified files: ", len(metadata["files"]))
-
-        output = self._generate_output(message_text, flow, flow_config)
+        output = self._generate_output(message_text, workflow_manager, workflow)
 
         output_message = Message(
             user_id=message.user_id,
-            root_msg_id=message.root_msg_id,
             role="assistant",
             content=output,
-            metadata=json.dumps(metadata),
+            meta=json.dumps(metadata),
             session_id=message.session_id,
         )
 
         return output_message
 
     def _generate_output(
-        self, message_text: str, flow: AutoGenWorkFlowManager, flow_config: AgentWorkFlowConfig
+        self,
+        message_text: str,
+        workflow_manager: WorkflowManager,
+        workflow: Workflow,
     ) -> str:
         """
         Generates the output response based on the workflow configuration and agent history.
 
         :param message_text: The text of the incoming message.
-        :param flow: An instance of `AutoGenWorkFlowManager`.
+        :param flow: An instance of `WorkflowManager`.
         :param flow_config: An instance of `AgentWorkFlowConfig`.
         :return: The output response as a string.
         """
 
         output = ""
-        if flow_config.summary_method == "last":
-            successful_code_blocks = extract_successful_code_blocks(flow.agent_history)
-            last_message = flow.agent_history[-1]["message"]["content"] if flow.agent_history else ""
+        if workflow.summary_method == "last":
+            successful_code_blocks = extract_successful_code_blocks(workflow_manager.agent_history)
+            last_message = (
+                workflow_manager.agent_history[-1]["message"]["content"] if workflow_manager.agent_history else ""
+            )
             successful_code_blocks = "\n\n".join(successful_code_blocks)
             output = (last_message + "\n" + successful_code_blocks) if successful_code_blocks else last_message
-        elif flow_config.summary_method == "llm":
-            model = flow.config.receiver.config.llm_config.config_list[0]
+        elif workflow.summary_method == "llm":
+            client = workflow_manager.receiver.client
             status_message = SocketMessage(
                 type="agent_status",
-                data={"status": "summarizing", "message": "Generating summary of agent dialogue"},
-                connection_id=flow.connection_id,
+                data={
+                    "status": "summarizing",
+                    "message": "Summarizing agent dialogue",
+                },
+                connection_id=workflow_manager.connection_id,
             )
             self.send(status_message.dict())
-            output = summarize_chat_history(task=message_text, messages=flow.agent_history, model=model)
+            output = summarize_chat_history(
+                task=message_text,
+                messages=workflow_manager.agent_history,
+                client=client,
+            )
 
-        elif flow_config.summary_method == "none":
+        elif workflow.summary_method == "none":
             output = ""
         return output
 
@@ -141,7 +160,9 @@ class WebSocketConnectionManager:
     """
 
     def __init__(
-        self, active_connections: List[Tuple[WebSocket, str]] = None, active_connections_lock: asyncio.Lock = None
+        self,
+        active_connections: List[Tuple[WebSocket, str]] = None,
+        active_connections_lock: asyncio.Lock = None,
     ) -> None:
         """
         Initializes WebSocketConnectionManager with an optional list of active WebSocket connections.
@@ -185,7 +206,7 @@ async def disconnect_all(self) -> None:
         for connection, _ in self.active_connections[:]:
             await self.disconnect(connection)
 
-    async def send_message(self, message: Dict, websocket: WebSocket) -> None:
+    async def send_message(self, message: Union[Dict, str], websocket: WebSocket) -> None:
         """
         Sends a JSON message to a single WebSocket connection.
 
@@ -202,7 +223,7 @@ async def send_message(self, message: Dict, websocket: WebSocket) -> None:
             print("Error: WebSocket connection closed normally")
             await self.disconnect(websocket)
         except Exception as e:
-            print(f"Error in sending message: {str(e)}")
+            print(f"Error in sending message: {str(e)}", message)
             await self.disconnect(websocket)
 
     async def broadcast(self, message: Dict) -> None:

diff --git a/samples/apps/autogen-studio/autogenstudio/cli.py b/samples/apps/autogen-studio/autogenstudio/cli.py
@@ -1,10 +1,10 @@
 import os
+from typing import Optional
 
 import typer
 import uvicorn
 from typing_extensions import Annotated
 
-from .utils.dbutils import DBManager
 from .version import VERSION
 
 app = typer.Typer()
@@ -18,6 +18,7 @@ def ui(
     reload: Annotated[bool, typer.Option("--reload")] = False,
     docs: bool = False,
     appdir: str = None,
+    database_uri: Optional[str] = None,
 ):
     """
     Run the AutoGen Studio UI.
@@ -29,11 +30,14 @@ def ui(
         reload (bool, optional): Whether to reload the UI on code changes. Defaults to False.
         docs (bool, optional): Whether to generate API docs. Defaults to False.
         appdir (str, optional): Path to the AutoGen Studio app directory. Defaults to None.
+        database-uri (str, optional): Database URI to connect to. Defaults to None. Examples include sqlite:///autogenstudio.db, postgresql://user:password@localhost/autogenstudio.
     """
 
     os.environ["AUTOGENSTUDIO_API_DOCS"] = str(docs)
     if appdir:
         os.environ["AUTOGENSTUDIO_APPDIR"] = appdir
+    if database_uri:
+        os.environ["AUTOGENSTUDIO_DATABASE_URI"] = database_uri
 
     uvicorn.run(
         "autogenstudio.web.app:app",

diff --git a/samples/apps/autogen-studio/autogenstudio/database/__init__.py b/samples/apps/autogen-studio/autogenstudio/database/__init__.py
@@ -0,0 +1,3 @@
+# from .dbmanager import *
+from .dbmanager import *
+from .utils import *