-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add new entry; clear output.jsonl to test summarize actions
- Loading branch information
Showing
2 changed files
with
1 addition
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
{"id": "2312.12321v1", "categories": ["security","open-source"]} | ||
{"id": "2312.02102v2", "categories": ["security"]} | ||
{"id": "2312.08282v2", "categories": ["prompt engineering"]} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +0,0 @@ | ||
{"id": "2312.12321v1", "text": "# Bypassing the Safety Training of Open-Source LLMs with Priming Attacks\n\n## 1 Introduction\nThe surge in popularity of Large Language Models (LLMs) has necessitated significant safety training. Despite efforts, it is still possible to circumvent the alignment to obtain harmful outputs. This paper investigates a threat model for circumventing alignment of open-source LLMs using priming attacks.\n\n## 2 Methodology & Results\nThe authors build an efficient pipeline for automated evaluation of priming attacks against open-source LLMs and demonstrate that priming with slightly more prompt-dependent content can improve the attack success rate by up to 3.3 times. The experimental setup, few-shot priming attacks, and the results of the attacks on different model families and sizes are presented.\n\n## 3 Conclusion\nThe paper highlights the fragility of current LLM safety measures under practical assumptions and emphasizes the need for further research into safer open-sourcing methods.\n\n#### A Details on experimental setup\nThe experimental setup involved using the pre-trained Llama-2 model for few-shot prompting and evaluating the attack success rate on harmful behaviors.\n\n#### B Few-Shot Prompt for Generating Priming Attacks\nThe paper presents the prompt format for generating priming attacks, which includes examples of prompts and affirmative initial responses.\n\n#### C Llama Guard Task Instructions\nThe instructions for evaluating the safety of responses using Llama Guard are provided, including the unsafe content categories and the conversation format used for evaluation.\n\n#### D Manual Evaluation Benchmark\nThe manual evaluation benchmark is explained using examples of harmful and safe responses, along with the criteria for labeling responses as harmful.\n\n#### E Manual Evaluation vs. Llama Guard\nA comparison between manual evaluation and Llama Guard results is presented, highlighting discrepancies in the assessment of harmful content.\n\n#### F Runtime Comparison\nThe runtime comparison between Llama-2 and few-shot prompting techniques is discussed, indicating the speed of generating priming attacks and comparing it to optimization-based techniques.\n\n---\nThe paper explores the vulnerability of open-source LLMs to priming attacks, presents a successful attack pipeline, and discusses the implications of the findings. Additionally, it provides insight into the discrepancies between manual and automated evaluation of harmful content.", "meta": {"url": "https://browse.arxiv.org/html/2312.12321v1", "title": "Bypassing the Safety Training of Open-Source LLMs with Priming Attacks", "subtitle": "LLMs need safety training due to vulnerability to priming attacks, improving attack success rates on harmful behaviors.", "categories": ["security", "open-source"], "publish_date": "2023-12-19"}} | ||
{"id": "2312.02102v2", "text": "# Mitigating Data Injection Attacks on Federated Learning\n\n## 1 Introduction\nThe increasing volume and variety of data have led to a need for data privacy and security in various industries. Federated learning, a collaborative approach for training machine learning models without sharing the raw data, has gained popularity in addressing these concerns. However, federated learning is susceptible to data injection attacks, where malicious agents manipulate the learning process. Detecting and mitigating these attacks are significant challenges in federated learning systems.\n\n## 2 Problem Formulation\n### 2.1 Federated Learning\nFederated learning involves learning a model using agents' private data by refining local model parameters and transmitting updates to a coordinating node to obtain a global model. The goal is to minimize an objective function using a gradient descent approach.\n\n### 2.2 Data Injection Attacks\nIn data injection attacks, malicious participants inject false data into the training process to manipulate the global model. Different attack schemes such as label flipping and constant output attacks can be employed to steer the model towards a false, predetermined performance.\n\n## 3 Attacker Detection and Avoidance\nA low-complexity metric is proposed to detect attackers by comparing the updates from edge agents over time. When an agent is suspected to be an attacker, its parameter updates are ignored for a certain period. The detection method is designed to operate continuously, and conditions are provided for identifying all malicious agents.\n\n## 4 Simulations\n### 4.1 Example 1: Constant-Output Attack\nSimulated attacks on decentralized learning show that the proposed detection algorithm effectively mitigates constant-output attacks by detecting and isolating attackers, leading to convergence to a truthful model.\n\n### 4.2 Example 2: Label-Flip Attack\nSimulated label-flip attacks demonstrate the effectiveness of the detection scheme in identifying and mitigating attacks, preventing the model from succumbing to false labeling.\n\n## 5 Conclusions\nThe paper presents a robust federated learning algorithm capable of operating in the presence of data injection attacks and provides conditions for proper identification of malicious agents. Simulations demonstrate the performance of the proposed technique in mitigating different types of attacks. Additional details and proofs of the proposed scheme will be presented in an extended version of the work.\n\nReferences are also included in the text.", "meta": {"url": "https://browse.arxiv.org/html/2312.02102v2", "title": "Mitigating Data Injection Attacks on Federated Learning", "subtitle": "Federated learning vulnerable to attacks. Paper proposes detection and mitigation technique using local scheme by coordinating node.", "categories": ["security"], "publish_date": "2023-12-04"}} | ||
{"id": "2312.12321v1", "text": "# Bypassing the Safety Training of Open-Source LLMs with Priming Attacks\n\n## 1 Introduction\n\nThe paper investigates the vulnerability of open-source Large Language Models (LLMs) to priming attacks, which bypass alignment from safety training. It discusses the potential threats arising from the increasing use of open-source LLMs in user-facing applications and proposes a threat model for circumventing the alignment of these models, assuming only API query access.\n\n## 2 Methodology & Results\n\nThe authors develop an efficient pipeline for automated evaluation of priming attacks against open-source LLMs. They utilize a non-safety-trained helper LLM to generate few-shot priming attacks for the target harmful behavior and demonstrate that priming with slightly more prompt-dependent content can significantly improve the attack success rate.\n\n- **Few-shot Priming Attacks**: The authors prompt a non-safety-trained helper LLM with few-shot examples and query prompts to generate a priming attack for the target harmful behavior.\n- **Experimental Setup**: The pre-trained Llama-2 model is used as the helper LLM, and 35 prompts from the Harmful Behaviors dataset are utilized to create 15 few-shot examples.\n\nThe priming attack outperforms the baselines for all models, achieving a 3.3x higher Attack Success Rate than the baseline attack under Llama Guard.\n\n## 3 Conclusion\n\nThe paper concludes by emphasizing the fragility of current LLM safety measures under increasingly practical assumptions and calls for further research into novel methods for safer open-sourcing of LLMs.\n\n## A Details on experimental setup\n\nThe experiments were conducted on a server equipped with an Intel Xeon processor with 48 vCPUs and 4 A100 GPUs.\n\n## B Few-Shot Prompt for Generating Priming Attacks\n\nThe paper presents a few-shot prompt format used for generating priming attacks against open-source LLMs.\n\n## C Llama Guard Task Instructions\n\nThe instructions given to Llama Guard for evaluating the ASR on Harmful Behaviors are provided, including the categories of unsafe content and the task format.\n\n## D Manual Evaluation Benchmark\n\nExamples of manually evaluated responses are provided, demonstrating harmful, mostly safe, and completely safe model outputs.\n\n## E Manual Evaluation vs. Llama Guard\n\nA comparison between manual labeling and Llama Guard's evaluation results is discussed, revealing that Llama Guard tends to be more hesitant in labeling a response as 'harmful.'\n\n## F Runtime Comparison\n\nThe runtime comparison of Llama-2 and few-shot prompting on Harmful Behaviors is provided, highlighting the efficiency of the few-shot task compared to optimization-based techniques.\n\nReferences to related academic works are also included in the original document.\n\nFor more details, and code snippets, please refer to the original document.", "meta": {"url": "https://browse.arxiv.org/html/2312.12321v1", "title": "Bypassing the Safety Training of Open-Source LLMs with Priming Attacks", "subtitle": "LLMs vulnerable to priming attacks, bypassing safety training; proposed attack improves success rate on harmful behaviors. Source code available.", "categories": ["security", "open-source"], "publish_date": "2023-12-19"}} | ||
{"id": "2312.02102v2", "text": "# Summary of \"Mitigating Data Injection Attacks on Federated Learning\"\n\n## Abstract\nThe paper addresses the vulnerability of federated learning to data injection attacks, which can lead to a compromised model. It proposes a novel technique to detect and mitigate such attacks, ensuring convergence to a truthful model with high probability.\n\n## 1 Introduction\nThe increasing need for data privacy and security has led to the popularity of federated learning, which allows training machine learning models collaboratively while preserving data privacy. However, federated learning is vulnerable to security threats, including data injection attacks. Data injection attacks involve malicious participants injecting false data into the training process to manipulate the global model.\n\n## 2 Problem Formulation\n### 2.1 Federated Learning\nFederated learning involves learning a model using agents' private data, where each agent refines its model parameters iteratively. The goal is to minimize an objective function using a gradient descent approach.\n\n### 2.2 Data Injection Attacks\nMalicious agents can perform various data injection attacks such as constant-output attacks and label-flipping attacks, aiming to manipulate the model training process and steer it towards a false model.\n\n## 3 Attacker Detection and Avoidance\nA technique is proposed for localizing and detecting attackers, based on a low-complexity metric computed over time. The coordinating agent compares updates received from edge agents and, if an agent is suspected to be an attacker, its parameter updates are ignored for a certain period.\n\n## 4 Simulations\nThe paper includes simulations demonstrating the proposed detection algorithm's performance in mitigating constant-output attacks and label-flip attacks. The experiments show that the proposed method detects and mitigates attackers effectively.\n\n## 5 Conclusions\nThe paper presents a robust federated learning algorithm that can operate in the presence of data injection attacks. It provides conditions for the proper identification of all malicious agents and shows the performance of the proposed technique on various data injection attacks.\n\nThe paper includes references to related works in the field of federated learning and security.\n\n- Keywords: Attack Detection, Data Injection Attacks, Decentralized Learning, Federated Learning\n\nThe text also includes figures and tables illustrating the results of the simulations.\n\nThe complete reference list is also provided at the end.\n\nOverall, the paper offers a comprehensive approach to mitigating data injection attacks in federated learning, backed by simulations and theoretical analysis.", "meta": {"url": "https://browse.arxiv.org/html/2312.02102v2", "title": "Mitigating Data Injection Attacks on Federated Learning", "subtitle": "Proposes a technique to detect and mitigate data injection attacks in federated learning systems.", "categories": ["security"], "publish_date": "2023-12-04"}} | ||