Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
The framework code of Meissonic, a NAT-based text-to-image model.
Meissonic is an efficient text-to-image synthesis foundation model, which can be run on consumer graphics cards with as little as 8 GB of VRAM. It is based on the non-autoregressive architecture and is designed to generate
- High-resolution image generation (up to 1024x1024)
- Designed to run on consumer GPUs
- Versatile applications: text-to-image, image-to-image
python inference.py --input_text "a red apple on a white plate" --output_dir ./output
If you find this work helpful, please consider citing:
@article{bai2024meissonic,
title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
journal={arXiv preprint arXiv:2410.08261},
year={2024}
}
This project is licensed under Apache License Version 2 (SPDX-License-identifier: Apache-2.0) with additional use restrictions. You can find the full text of the license(s) in the following path: ./LICENSE
We used compliance checking algorithms during the training process, to ensure the compliance of the trained model and dataset to the best of our ability. Due to complex data and the diversity of language model usage scenarios, we cannot guarantee that the model is completely free of copyright issues or improper content. If you believe anything infringes on your rights or generates improper content, please contact us, and we will promptly address the matter.