-
Notifications
You must be signed in to change notification settings - Fork 0
/
rai.tex
10 lines (7 loc) · 2.37 KB
/
rai.tex
1
2
3
4
5
6
7
8
9
10
%auto-ignore
\section{Discussion and Social Impact}
The \name~model confirms the findings of \citep{imagen} that frozen large pretrained language models serve as powerful text encoders for
text-to-image generation. We also tried in our initial experiments to learn a language model from scratch on the training data, but found that performance was significantly worse than using a pre-trained LLM, especially on long prompts and rare words. We also show that non-diffusion, non-autoregressive models based on the Transformer architecture can perform at par with diffusion models while being significantly more efficient at inference time. We achieve SOTA CLIP scores, showing an excellent alignment beteween image and text. We also show the flexibility of our approach with a number of image editing applications.
We recognize that generative models have a number of applications with varied potential for impact on human society. Generative models \citep{imagen,parti,ldm,midjourney} hold significant potential to augment human creativity \citep{hughes2021generative}. However, it is well known that they can also be leveraged for misinformation, harassment and various types of social and cultural biases \citep{franks2018sex,whittaker2020all,srinivasan2021biases,steed2021image}. Due to these important considerations, we opt to not release code or a public demo at this point in time.
Dataset biases are another important ethical consideration due to the requirement of large datasets that are mostly automatically curated. Such datasets have various potentially problematic issues such as consent and subject awareness \citep{paullada2021data, dulhanty2020issues,scheuerman2021datasets}. Many of the commonly used datasets tend to reflect negative social stereotypes and viewpoints \citep{prabhu2020large}. Thus, it is quite feasible that training on such datasets simply amplifies these biases and significant additional research is required on how to mitigate such biases, and generate datasets that are free of them: this is a very important topic \citep{buolamwini2018gender,hendricks2018women} that is out of the scope of this paper.
Given the above considerations, we do not recommend the use of text-to-image generation models without attention to the various use cases and an understanding of the potential for harm. We especially caution against using such models for generation of people, humans and faces.