Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Improve AutogenStudio Database Implementation (link database entities, remove api keys) #1694

Closed
jxraynaud opened this issue Feb 15, 2024 · 5 comments
Labels
0.2 Issues which were filed before re-arch to 0.4 proj-studio Related to AutoGen Studio.

Comments

@jxraynaud
Copy link

Is your feature request related to a problem? Please describe.

I have the same issue than the one describe here : #1415. So I dig a bit deeper and looked at the schema used for the sqlite database. I noticed a few things:

  1. The api keys are stored in the database. That's a problem because an easy way to share a whole project is to share the database itself. I'm aware that everything can be exported and then reimported, but easiest way is still to share the db, and some people will do it without being aware that the database is storing api keys.
  2. Storing, in plain text, api keys in a db is a very bad practice.
  3. The problem with updating the api key of the model and having to relink the model of every single agent is a "no-go" for any real world project. As long as we don't take advantage of relational databases like sqlite to solve this issue it means that maintenance is impossible.

Describe the solution you'd like

Analyse a bit better the actual underlying relational schema. For example the agent table should have a model_id field.
Implement a serialised/deserializer logic allowing to be both flexible and strict when it comes to relations. Implement cascade/protect logic (for example not being able to delete an agent used by a workflow or a model used by an agent).

Modify the existing code to ensure that we use the serializer/deserializer logic everywhere and that we have proper handling of the errors in the UI.

Never store the value of the api key in the database. Instead allow the deserializer to automatically get it from env variable or a config file or a .env file. Obviously the serializer must remove the api keys when receiving the llm_config.

Obviously the hard part would be to create a script allowing to migrate from the previous version of the db to the new one. But I think agents can do this kind of scripts relatively well.

I can help for the db part.

Additional context

No response

@sonichi sonichi added the proj-studio Related to AutoGen Studio. label Feb 15, 2024
@victordibia
Copy link
Collaborator

victordibia commented Feb 15, 2024

Hi @jxraynaud ,

Thanks for the notes. All of these can improve current behaviors in autogenstudio. Studio originally began as a example but is transitioning into a tool that we can all use to prototype with autogen. The changes above will help us get there.

To summarize your proposed changes and add thoughts on implementation:

Linking Database Entities.

The current implementation uses the database simply for persistence without managing associations across tables for each entity (skills, models, agents, workflows). The proposed change will natively support links between entities (primary and foreign keys). Some potential implementation steps

  • Update datamodel schema such that each entity has an id field where not available (managed/provided by a db).
  • Implement UI warnings that let users know if an entity has existings before they are deleted. . E.g. skills, models, agents associated with an existing workflow.
  • Implement workflow serialization logic that rehydrates a Worflow object via queries respective linked entities from db. This mostly will happen on the /messages route when a new message is sent from the UI.
  • Update playground UI to keep track of workflow ids in sessions and not snapshots of the workflow itself. This way, updates to the workflow propagrate to a session playground without having to create a new session

API Keys in Env Variables / Config Files.

Currently implementation saves all data provided in in the db (including api_keys). Proposed changes will remove api keys from db and maintain a reference to env variables. Implementation wise

  • Update models table api_key field to keep a reference to an env_variable name rather than a key (manage this mapping)
  • When new models are created in the UI, create an env variable name which is stored in the db, and append that env variable (+ its value) to a .env file on disc in appdir.

These are significant changes and I am happy to treat these as a breaking update at the moment (i.e., start with a focus on functionality then implement migrations/backward compat later).
We can also use the opportunity to completely rewrite the database layer for AGS using an ORM tool like SQLAlchemy. This lets us maintain the validation we have with fastapi whilst also supporting multiple db backends.

Overall, all of these will really improve the dev ex for AGS. What are your thoughts?

Related

@jxraynaud
Copy link
Author

That seems great ! Indeed using SQL Alchemy will allow to easily tackle the Postgres support feature request.

I'd be happy to contribute. I'll work on a first database schema next week and I'll post it here. If you validate it, I can try to implement the SQL alchemy / fastapi part.

@victordibia
Copy link
Collaborator

Thanks.
Sounds like a good plan.
I'll also do some investigation when I have sometime and share updates.

@victordibia victordibia changed the title [Feature Request]: Remove api keys from the sqlite database in studio + take advantage of the relations [Feature Request]: Improve AutogenStudio Database (link database entities, remove api keys) Mar 5, 2024
@victordibia victordibia changed the title [Feature Request]: Improve AutogenStudio Database (link database entities, remove api keys) [Feature Request]: Improve AutogenStudio Database Implementation (link database entities, remove api keys) Mar 5, 2024
@victordibia
Copy link
Collaborator

victordibia commented Mar 16, 2024

I have started looking into moving AGS backend to use an ORM (to mainly address db entity linking issue above).

Based on the research I have done, SQLModel seems to be the right approach - it integrates SQLAlchemy and FastAPI data models. The benefits are multiple

  • we can support multiple db backends via SQL alchemy
  • we leverage SQL alchemy methods to enforce better entity linking
  • We can use the same datamodels for the DB and WebEndpoing. Massive win, simpler code base.

I have started a branch here to explore this.

ORM Implementation Progress

  • Wrap into a PR and Merge into AGS dev branch
  • Implement migrations (ideally via Alembic)
  • Delete old data model completely
  • Implement and verify entity linking and data protection rules (e.g. delete protections etc )
  • Rewrite UI to use new api end point
    • Build
    • Playground
    • Gallery
  • Rewrite web API Endpints to use ORM classes
  • Create ORM Models for Entities
  • Create new DB Manager
    • Crud utils for all entities.
      • Delete - DBManager.delete(entity, args)
      • Upset - DBManager.upsert(entity, args)
      • Create - DBManager.create(entity, args)
    • init call to create db and tables

FYI @gagb , @ekzhu

@rysweet
Copy link
Collaborator

rysweet commented Oct 18, 2024

@victordibia - it looks like these are all resolved except migrations.... closing and please open a new issue for migrations if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which were filed before re-arch to 0.4 proj-studio Related to AutoGen Studio.
Projects
None yet
Development

No branches or pull requests

4 participants