Potential safety issue: cross-layer leakage #6

mdubinko · 2023-09-19T00:13:44Z

mdubinko
Sep 19, 2023

I'm guessing a typical implementation would use the same LLM for multiple layers. In theory, each API call to an LLM is supposed to be independent of all other API calls. But a) this may not be the case for all future LLM APIs, b) there could be a bug that allows leakage, or c) an extremely capable LLMs could, in principle, deliberately accumulate knowledge across different calls and perform all kinds of hijinks (and be sneaky about it).

In that scenario, what failure modes could be possible? I expect even well-behaved agents will exhibit all kinds of surprising emergent behaviors. If something suspicious starts to happen, it may not be very evident, even with human readable bus logs. What stance should the framework take about this?

(The alternative, a completely independent LLM for each layer, would be relatively immune from these failure modes, but would take at least 6x the compute, storage, etc.)

samgriek · 2023-09-19T10:09:16Z

samgriek
Sep 19, 2023
Collaborator

@mdubinko - That's an interesting question. If you lock in to a given model (let's say the current version of ChatGPT 3.5), I don't believe it is aware or has context regarding the different conversations from the different layers just like it doesn't have cumulative knowledge of my conversations and yours without additional intervention like retraining, embedding, vector store, etc. Additionally, it may make sense for different layers to use different specialized models for their scope and responsibilities anyway.

If the LLM has context across the layers and somehow redefines the ethics based on this context for the aspirational layer, then in reality the separation of busses, bus messages, and roles is a thin vale. That's a lot of if's and solvable through proper engineering, architecture, and guardrails. Even if this wasn't the case, the auditor should pick up bad intentions in the messages, and shut the system down.

1 reply

DataBassGit Sep 19, 2023
Collaborator

In 3 years, most models will be local. But that's assuming we move away from these 180b parameter models. I think we will. I think as we see attention and context get optimized, the need for models that require massive amounts of compute will decrease. Instead of making a model memorize the entire web, the cognitive architecture just adds any relevant data needed to the context window from the web.

@mdubinko As for leakage between layers, this isn't something that LLMs currently do. They have no inherit memory. That doesn't mean they won't later. That being said, because this is being developed as an NLP first system, the models are interchangeable. You could theoretically use a different model for each layer. It's not the most efficient way of doing things, but it's entirely possible.

I don't see a need for information to be hidden between layers at this point, however. If there's any security concern, it's from outsiders being able to see inside. And the first, easiest suspect is the company hosting your model.

daveshap · 2023-09-19T12:00:08Z

daveshap
Sep 19, 2023
Maintainer

Check out the security section https://github.com/daveshap/ACE_Framework#security

1 reply

draconicfae Sep 20, 2023

typo in your link, should be https://github.com/daveshap/ACE_Framework/blob/main/ACE_Framework.md#security

mdubinko · 2023-09-20T14:39:02Z

mdubinko
Sep 20, 2023
Author

Came across this today, on RECESSIM's YT. https://www.youtube.com/watch?v=0smUe5xvAOQ
The relevant quote for security purposes:

"A bus or component that doesn't come with tools for practical injection and crafted inputs and states should be considered insecure" -Sergey Bratus

P.S. I think there should be a starter implementation much simpler than a full-blown agent...but that's for a different thread.

1 reply

daveshap Sep 20, 2023
Maintainer

That's why we're starting with an MVP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential safety issue: cross-layer leakage #6

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Potential safety issue: cross-layer leakage #6

mdubinko Sep 19, 2023

Replies: 3 comments · 3 replies

samgriek Sep 19, 2023 Collaborator

DataBassGit Sep 19, 2023 Collaborator

daveshap Sep 19, 2023 Maintainer

draconicfae Sep 20, 2023

mdubinko Sep 20, 2023 Author

daveshap Sep 20, 2023 Maintainer

mdubinko
Sep 19, 2023

Replies: 3 comments 3 replies

samgriek
Sep 19, 2023
Collaborator

DataBassGit Sep 19, 2023
Collaborator

daveshap
Sep 19, 2023
Maintainer

mdubinko
Sep 20, 2023
Author

daveshap Sep 20, 2023
Maintainer