Research on Controllable Text Generation has made significant progress over the past few years. This repository is an attempt to create a one-stop shop for researchers who want to benchmark their method on various tasks in Controllable Text Generation. This includes:
- Compilation of over 17 different generation tasks
- Compilation of over 10 different constraint functions/datasets for the training of constraint satisfaction classifiers
- A prompt-based LLM distillation method that produces a constraint satisfaction classifier for any natural language constraint
- Implementations of 5 different baselines
Constraint Datasets
- Toxicity (Jigsaw Toxicity Classification Challenge)
- Sentiment (Yelp Polarity, SST2, SST5, IMDB Reviews)
- Topic (AGNews)
- Genre (StoryControl, TagMyBook)
- Clickbait (Clickbait News Detection, Stop Clickbait)
- Formality (Pavlick, GYAC must be downloaded from source)
- Spam (Spamassassin, SMS Spam)
- Urgency (Derived from CrisisNLP) Constraint Functions
- Numerical Structure Constraints (word/sentence/POS counts and ranges)
- Score-based reranking
- Prompt Tuning
- ZeroShot Prompting
- FewShot Prompting
- LoRA
- Perspective API for Toxicity
- External classifier (using classifiers trained on held-out constraint datasets)
- ZeroShot / FewShot Prompt evaluation
- LM Objective/ Perplexity
- ROGUE
- BLEU