Skip to content
View YiZeng623's full-sized avatar
🏔️
@ Menlo Park
🏔️
@ Menlo Park

Highlights

  • Pro

Organizations

@reds-lab

Block or report YiZeng623

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. LLM-Tuning-Safety/LLMs-Finetuning-Safety LLM-Tuning-Safety/LLMs-Finetuning-Safety Public

    We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

    Python 252 29

  2. CHATS-lab/persuasive_jailbreaker CHATS-lab/persuasive_jailbreaker Public

    Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!

    HTML 265 19

  3. reds-lab/Narcissus reds-lab/Narcissus Public

    The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% att…

    Python 105 12

  4. I-BAU I-BAU Public

    Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''

    Jupyter Notebook 51 13

  5. frequency-backdoor frequency-backdoor Public

    ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Repo. explores how we can use these artifacts to develop strong…

    Jupyter Notebook 42 6

  6. reds-lab/Meta-Sift reds-lab/Meta-Sift Public

    The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on poisoned dataset.

    Python 18 4