We provide the KnowUnDo, a benchmark containing copyrighted content and user privacy domains to evaluate if the unlearning process inadvertently erases essential knowledge. Access our KnowUnDo directly on Hugging Face.
To address this, we propose a simple yet effective method, MemFlex, which utilizes gradient information to precisely target and unlearn sensitive parameters.
You can easily load the datasets following below.
from datasets import load_dataset
dataset = load_dataset("zjunlp/KnowUnDo", name='copyright', split='unlearn')
- Available configuration names and corresponding splits:
copyright
:unlearn
,retention
;privacy
:unlearn
,retention
;
git clone https://github.com/zjunlp/KnowUnDo.git
cd KnowUnDo
conda create -n KnowUnDo python==3.10
conda activate KnowUnDo
pip install -e .
pip install -r requirements.txt
cd llm_unlearn/apex
pip install -v --no-cache-dir ./
# directory: KnowUnDo
mkdir models
cd models
git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
git clone https://huggingface.co/Qwen/Qwen1.5-7B-Chat
# directory: pretrain
bash run_finetune_lora.sh
We have released the localized knowledge region. You can perform the localization yourself as follows.
# directory: pretrain
bash run_localization.sh
# directory: llm_unlearn
cd utils
bash tokenize_datasets.sh
--val
for theval
split of the dataset.--prompt
for concatingdirect_prompt
before thequestion
in the datasets.
# directory: llm_unlearn
bash run_baselines_lora.sh
bash run_ours_lora.sh
- Available methods with corresponding arguments:
--unlearn_method gradient_ascent
--unlearn_method random_label --completely_random True
(named Fine-tuning with Random Labels in the paper)--unlearn_method random_label --top_k 1 --rm_groundtruth True
(named Unlearning with Adversarial Samples in the paper)--unlearn_method ascent_plus_descent
--unlearn_method ascent_plus_kl_divergence
--unlearn_method ascent_plus_descent --general True
--unlearn_method ascent_plus_kl_divergence --general True
--unlearn_method memflex
(the strong baseline proposed by us)
You can evaluate multiple unlearned models together by running our script only once.
# directory: llm_unlearn
bash run_eval_baselines_lora.sh
--direct_prompt=True
means concatingdirect_prompt
before thequestion
in the datasets.
We would like to express our sincere gratitude to the excellent work Unlearning LLM, TOFU, LLaMA, and Qwen.
If you use or extend our work, please cite the paper as follows:
@article{tian2024forget,
title={To forget or not? towards practical knowledge unlearning for large language models},
author={Tian, Bozhong and Liang, Xiaozhuan and Cheng, Siyuan and Liu, Qingbin and Wang, Mengru and Sui, Dianbo and Chen, Xi and Chen, Huajun and Zhang, Ningyu},
journal={arXiv preprint arXiv:2407.01920},
year={2024}
}