The goal of this project is to explore potential uses of large language models for the task of improving current state of the art text-to-image models such as Stable Diffusion.
Writing optimal text prompts to best guide a text-to-image model towards a desired result can be a complex task, often requiring the use of seemingly arbitrary keywords and various style modifiers.
Heavy use of these modifiers is common practice among experienced users due to their frequent positive effect on subjective aesthetic quality, as well as their ability to generate images more closely aligned with the desired result. Even subtle changes in word placement can have a significant effect, creating potentially unnecessary work for even the most skilled prompt writers.
Given this complexity and lack of intuitiveness, prompt input as UI for text-to-image models is currently less than ideal.
This project is currently in the exploratory phase. We welcome any and all feedback from the community and would love to discuss potential proposals with anyone interested in the project. Check out the discussions tab to get started.
Name | Description | Status |
---|---|---|
Initial experiment | Expand prompt detail with a LLM | Complete |
Trained “Unsimplification” Model | Train a model to “unsimplify” prompts | Feedback requested |