As LLMs become more easily available and integrated into our work and personal lives, the promise of the technology is tempered by the potential for it to be misused. And the potential for misuse becomes even more significant when you realize LLMs can be combined with other powerful software components and agents to orchestrate a pipeline of actions. OR combined with proprietary and personal data to introduce new avenues for data disclosure and leakage.
The intention for this page is not to reiterate security guidance that is generally available for more traditional or cloud software applications but to focus on guidance specific to GenAI applications and the unique characteristics and challenges of LLMs.
The security threats and risks with traditional software applications are familiar and understood. GenAI and LLMs introduce new and unique security risks including:
- AI responses are based on statistical probabilities or the best chance for correct output. LLMs generate convincing human-like responses by predicting what words come next in a phrase. While they can be great at helping with tasks like summarizing a document or explaining complicated concepts or boosting creativity, there can be issues like responses being inaccurate, incomplete, inappropriate, or completely fabricated. You may be familiar with one well known example where ChatGPT provided non-existent legal citations that lawyers presented in court: Here's what happens when your lawyer uses ChatGPT.
- GenAI is by design a non-deterministic technology which means that given identical inputs, responses and output may differ.
- GenAI applications can be extended with agents, plugins, and even external APIs that can significantly expand the attack surface for a GenAI application. For instance, an LLM may implicitly trust a plugin or 3rd party component that is malicious.
- Another challenge with GenAI is that it currently it is not possible to enforce an isolation boundary between the data and the control planes. This means that LLMs are not always able to differentiate between data being submitted as content or an adversarial instruction submitted as content. Think about a SQL databases: instructions are supplied through query language and validated with a parser before data is queried, manipulated, or provided as output. With a SQL injection attack, a malicious instruction can piggyback on an ambiguously phrased language construct but it can be mitigated with a parameterized query. GenAI/LLMs do not have that boundary between syntax (control plane) and data so other mechanisms are needed.
The diagram below is from OWASP Top 10 for Large Language Model Applications and depicts the potential security risks for a hypothetical LLM app:
Infrastructure plays an indispensable role in helping create a secure landscape for GenAI applications, particularly cloud environments. Below are strategies tht can help ensure the security of a GenAI environment:
- Threat Modeling Include GenAI apps in your threat modeling practice. Understand that GenAI can extend attack surface with access to underlying or referenced data sources, access to model API keys, workflow orchestration, and agents and plugins. Learn more about what can go wrong with The AI Attack Surface Map v1.0.
- Architecture strategies help ensure a secure, scalable, and available environment.
- Baseline OpenAI end-to-end chat reference architecture: a baseline architecture for building and deploying enterprise chat apps that use Azure OpenAI.
- OpenAI end-to-end baseline reference implementation: Author and run a chat app in a single region with Azure ML and OpenAI.
- Network strategies help ensure that the cloud infrastructure is properly segmented and that access is controlled and monitored. Consider network segmentation, using secure protocols, enforcing Secure APIs and endpoints. For GenAI specific recommendations see: Cognitive Services Landing Zone in-a-box
- Access and Identity strategies to enforce user verification and provide a barrier to malicious access. When possible, use managed identities and RBAC to authenticate and authorize access and avoid use of GenAI service API keys for access. Another consideration to keep in mind is that access patterns like role or row level access to indexes may not be natively supported. See:
- Application strategies help ensure the application is configured securely and vulnerabilities are identified and addressed:
- Use App front end services to manage access and throughput. See: Azure OpenAI Service Load Balancing with Azure API Management and Smart load balancing for OpenAI endpoints and Azure API Management
- Ensure related services are deployed securely (AI Search, Cosmos DB, etc)
- Secure and validate training data and ingestion pipelines
- Governance strategies help ensure the infrastructure is being used is meeting security and compliance requirements and that policies and procedures are in place to manage risk and accountability:
- Become familiar with Responsible AI principles and frameworks and integrate them early in the development of your application. More here: Responsible AI
- Leverage platform capabilities for logging, auditing, and monitoring GenAI apps. See: Implement logging and monitoring for Azure OpenAI models.
An adversarial prompt attack is when a prompt is used to manipulate an LLM in order to generate a malicious or unintended response. A sneaky user can tamper with words or sentence structure to exploit nuances or sentiment in language models. You may be familiar with some types of prompt attacks:
- Prompt injection: prompt input, output, or instructions are manipulated to lead to unintended behavior.
- Prompt leaking: is intended to cause the model to leak confidential or proprietary information.
- Jailbreaking: is a technique to bypass model safety mechanisms to generate illegal or unethical content.
- DAN: is an acronym for Do Anything Now and is another technique intended to circumvent model safety guardrails and force it to comply with requests that generate unfiltered responses.
- Multi-prompt: a series of prompts are used to extract private or sensitive information.
- Multi-language: although LLMs are trained in multiple languages, performance is superior for English. This technique involves submitting a request in languages other than English to cause the model to overlook or bypass security checks.
- Obfuscation (token smuggling): a technique to present data in an unexpected format to avoid detection.
Note: Details about these and other adversarial techniques can be found here:
As a mitigation strategy for adversarial prompt attacks, consider advanced prompt engineering techniques. There is a growing list of specific techniques that can be used that include enriching prompts with specific instructions, formatting, and providing examples of the kind of output content that is intended. Below are some techniques to consider:
-
Defensive Instructions: Guide model response with explicit instructions. Structure the system message with context and instructions. See also: Add defense in the instruction.
-
Determine intent: Use techniques like few-shot learning to provide content to the model and help set intent.
-
Monitor for degradation in output quality: a decline in output quality can be an indication that a prompt has been tampered with. Monitor the model for output quality using metrics to measure and evaluate the prompt or human-in-the-loop to evaluate feedback or adversarial test cases to confirm prompt resilience. Azure Machine Learning prompt flow has built-in evaluation flows that enable users to assess the quality and effectiveness of prompts.
-
Use other models or dedicated services to process requests. Azure AI Content Safety is an azure service that provides content filtering. AI models are used to detect and classify categories of harm from AI-generated content. Content filters are more contextually aware than blocklists and can provide broad coverage without the manual creation of rules or lists.
-
Use inbound/outbound block/allow lists or filters or rules. When there is a need to screen for items specific to a use case, blocklists can be helpful and can be implemented as part of the AI Content Safety service. See: Use a blocklist in Azure OpenAI.
-
Use the native power of models to steer zero- or few-shot prompting strategies. See promptbase for a growing collection of resources, best practices, and sample scripts.
See Exploring Adversarial Prompting and Mitigations in AI-Infused Applications for more specifics on these types of attacks and defense tactics.
- OWASP Top 10 for LLM applications and the downloadable whitepaper
- OWASP LLM AI Security & Governance Checklist
- Security Best Practices for GenAI Applications in Azure
- Steering at the Frontier: Extending the Power of Prompting
- Planning red teaming for large language models (LLMs) and their applications