Skip to content

Commit

Permalink
Merge pull request #3 from sumedhghaisas2/synthid_text
Browse files Browse the repository at this point in the history
Further changes
  • Loading branch information
gante authored Oct 23, 2024
2 parents 7da4040 + 97dcb98 commit 76fc84a
Show file tree
Hide file tree
Showing 8 changed files with 893 additions and 382 deletions.
33 changes: 33 additions & 0 deletions examples/research_projects/synthid_text/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# SynthID Text

This project showcases the use of SynthIDText for watermarking LLMs. The code shown in this repo also
demostrates the training of the detector for detecting such watermarked text. This detector can be uploaded onto
a private HF hub repo (private for security reasons) and can be initialized again through pretrained model loading also shown in this script.


## Python version

User would need python 3.9 to run this example.

## Installation and running

Once you install transformers you would need to install requirements for this project through requirements.txt provided in this folder.

```
pip install -r requirements.txt
```

## To run the detector training

```
python detector_training.py --model_name=google/gemma-7b-it
```

Check the script for more parameters are are tunable and check out paper at link
https://www.nature.com/articles/s41586-024-08025-4 for more information on these parameters.

## Caveat

Make sure to run the training of the detector and the detection on the same hardware
CPU, GPU or TPU to get consistent results (we use detecterministic randomness which is hardware dependent).

Loading

0 comments on commit 76fc84a

Please sign in to comment.