Implementaion of our paper:
Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models
🔥 News
- [Apr 8, 2024]: New repo released!
Conda environment
Tested on the following environment, but it should work on other versions.
- python 3.10.10
- pytorch
pip3 install -r requirements.txt
- [optional]
pip3 install flash-attn==2.3.3
Overview
src_watermark
implements three text watermarking methods (x-sir
,sir
andkgw
) with a unified interface.attack
contains two watermarking removal methods: paraphrase and translation- Scripts:
gen.py
: generate text with watermarkdetect.py
: compute z-score for given textseval_detection.py
: calculate AUC, TPR, and F1 for watermark detection- You can use
--help
to see full usage of these scripts.
- Supported models:
meta-llama/Llama-2-7b-hf
baichuan-inc/Baichuan2-7B-Base
baichuan-inc/Baichuan-7B
mistralai/Mistral-7B-v0.1
- Supported languages: English (En), German (De), French (Fr), Chinese (Zh), Japanese (Ja)
- You can learn how to extend model and language in from-scratch.md.
Generate text with watermark
MODEL_NAME=baichuan-inc/Baichuan-7B
MODEL_ABBR=baichuan-7b
TRANSFORM_MODEL=data/model/transform_model_x-sbert_10K.pth
MAPPING_FILE=data/mapping/xsir/300_mapping_$MODEL_ABBR.json
WATERMARK_METHOD_FLAG="--watermark_method xsir --transform_model $TRANSFORM_MODEL --embedding_model paraphrase-multilingual-mpnet-base-v2 --mapping_file $MAPPING_FILE"
python3 gen.py \
--base_model $MODEL_NAME \
--fp16 \
--batch_size 32 \
--input_file data/dataset/mc4/mc4.en.jsonl \
--output_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
--WATERMARK_METHOD_FLAG
Compute the z-scores
# Compute z-score for human-written text
python3 detect.py \
--base_model $MODEL_NAME \
--detect_file data/dataset/mc4/mc4.en.jsonl \
--output_file gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
$WATERMARK_METHOD_FLAG
# Compute z-score for watermarked text
python3 detect.py \
--base_model $MODEL_NAME \
--detect_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
--output_file gen/$MODEL_ABBR/xsir/mc4.en.mod.z_score.jsonl \
$WATERMARK_METHOD_FLAG
Evaluation
python3 eval_detection.py \
--hm_zscore gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
--wm_zscore gen/$MODEL_ABBR/xsir/mc4.en.mod.z_score.jsonl
AUC: 0.994
TPR@FPR=0.1: 0.994
TPR@FPR=0.01: 0.862
F1@FPR=0.1: 0.955
F1@FPR=0.01: 0.921
Here we test the watermark after translating to other languages (De, Fr, Zh, Ja).
We use ChatGPT to perform paraphrase and translation. Therefore:
- Set you openai api key:
export OPENAI_API_KEY=xxxx
- You may also want to modify the RPMs and TPMs in
attack/const.py
Translation
TGT_LANGS=("de" "fr" "zh" "ja")
for TGT_LANG in "${TGT_LANGS[@]}"; do
python3 attack/translate.py \
--input_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
--output_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.jsonl \
--model gpt-3.5-turbo-1106 \
--src_lang en \
--tgt_lang $TGT_LANG
done
Compute the z-scores
for TGT_LANG in "${TGT_LANGS[@]}"; do
python3 detect.py \
--base_model $MODEL_NAME \
--detect_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.jsonl \
--output_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.z_score.jsonl \
$WATERMARK_METHOD_FLAG
done
Evaluation
for TGT_LANG in "${TGT_LANGS[@]}"; do
echo "En->$TGT_LANG"
python3 eval_detection.py \
--hm_zscore gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
--wm_zscore gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.z_score.jsonl
done
En->de
AUC: 0.769
TPR@FPR=0.1: 0.318
TPR@FPR=0.01: 0.060
F1@FPR=0.1: 0.450
F1@FPR=0.01: 0.112
En->fr
AUC: 0.810
TPR@FPR=0.1: 0.354
TPR@FPR=0.01: 0.046
F1@FPR=0.1: 0.488
F1@FPR=0.01: 0.087
En->zh
AUC: 0.905
TPR@FPR=0.1: 0.702
TPR@FPR=0.01: 0.182
F1@FPR=0.1: 0.781
F1@FPR=0.01: 0.305
En->ja
AUC: 0.911
TPR@FPR=0.1: 0.696
TPR@FPR=0.01: 0.112
F1@FPR=0.1: 0.775
F1@FPR=0.01: 0.200
You can use the following flags to specify the watermarking method:
KGW
WATERMARK_METHOD_FLAG="--watermark_method kgw"
SIR
MODEL_NAME=baichuan-inc/Baichuan-7B
MODEL_ABBR=baichuan-7b
TRANSFORM_MODEL=data/model/transform_model_x-sbert_10K.pth
MAPPING_FILE=data/mapping/sir/300_mapping_$MODEL_ABBR.json
WATERMARK_METHOD_FLAG="--watermark_method sir --transform_model $TRANSFORM_MODEL --embedding_model paraphrase-multilingual-mpnet-base-v2 --mapping_file $MAPPING_FILE"
This work can not be done without the help of the following repos:
- SIR: https://github.com/THU-BPM/Robust_Watermark
- KGW: https://github.com/jwkirchenbauer/lm-watermarking
@article{he2024can,
title={Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models},
author={He, Zhiwei and Zhou, Binglin and Hao, Hongkun and Liu, Aiwei and Wang, Xing and Tu, Zhaopeng and Zhang, Zhuosheng and Wang, Rui},
journal={arXiv preprint arXiv:2402.14007},
year={2024}
}