Replies: 3 comments 3 replies
-
@mdubinko - That's an interesting question. If you lock in to a given model (let's say the current version of ChatGPT 3.5), I don't believe it is aware or has context regarding the different conversations from the different layers just like it doesn't have cumulative knowledge of my conversations and yours without additional intervention like retraining, embedding, vector store, etc. Additionally, it may make sense for different layers to use different specialized models for their scope and responsibilities anyway. If the LLM has context across the layers and somehow redefines the ethics based on this context for the aspirational layer, then in reality the separation of busses, bus messages, and roles is a thin vale. That's a lot of if's and solvable through proper engineering, architecture, and guardrails. Even if this wasn't the case, the auditor should pick up bad intentions in the messages, and shut the system down. |
Beta Was this translation helpful? Give feedback.
-
Check out the security section https://github.com/daveshap/ACE_Framework#security |
Beta Was this translation helpful? Give feedback.
-
Came across this today, on RECESSIM's YT. https://www.youtube.com/watch?v=0smUe5xvAOQ "A bus or component that doesn't come with tools for practical injection and crafted inputs and states should be considered insecure" -Sergey Bratus P.S. I think there should be a starter implementation much simpler than a full-blown agent...but that's for a different thread. |
Beta Was this translation helpful? Give feedback.
-
I'm guessing a typical implementation would use the same LLM for multiple layers. In theory, each API call to an LLM is supposed to be independent of all other API calls. But a) this may not be the case for all future LLM APIs, b) there could be a bug that allows leakage, or c) an extremely capable LLMs could, in principle, deliberately accumulate knowledge across different calls and perform all kinds of hijinks (and be sneaky about it).
In that scenario, what failure modes could be possible? I expect even well-behaved agents will exhibit all kinds of surprising emergent behaviors. If something suspicious starts to happen, it may not be very evident, even with human readable bus logs. What stance should the framework take about this?
(The alternative, a completely independent LLM for each layer, would be relatively immune from these failure modes, but would take at least 6x the compute, storage, etc.)
Beta Was this translation helpful? Give feedback.
All reactions