Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add icon and description for Stable Diffusion benchmark #917

Merged
merged 5 commits into from
Dec 17, 2024

Conversation

anhappdev
Copy link
Collaborator

@anhappdev anhappdev commented Sep 12, 2024

  • The icon is drawn by me using Figma. We can replace it with one from a designer later.
  • The description for the Stable Diffusion benchmark is provided by @Mostelk

Copy link

github-actions bot commented Sep 12, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link

@freedomtan
Copy link
Contributor

@AhmedTElthakeb please report number of parameters and FLOPs of the 3 models we use.

@AhmedTElthakeb
Copy link
Contributor

Model Name Parameters MACS
text_encoder 123060480 8.958 G
vae_decoder 49490199 1273.718 G
sd_diffusion_1 447042560 147.435 G
sd_diffusion_2 412478404 281.060 G

@anhappdev anhappdev changed the base branch from master to submission-v4.1 October 22, 2024 10:52
@anhappdev anhappdev force-pushed the anh/stable-diffusion-description branch from bc50b66 to be59d3c Compare October 22, 2024 10:59
@anhappdev anhappdev added this to the v4.1 milestone Oct 22, 2024
Copy link

@anhappdev
Copy link
Collaborator Author

@Mostelk Please provide a description for the Stable Diffusion benchmark.

@Mostelk
Copy link

Mostelk commented Dec 12, 2024

@Mostelk Please provide a description for the Stable Diffusion benchmark.

Please check this description, we reviewed it in the Wed meeting

The Text to Image Gen AI benchmark adopts Stable Diffusion v1.5 for generating images from text prompts. It is a latent diffusion model. The benchmarked Stable Diffusion v1.5 refers to a specific configuration of the model architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet,123M CLIP ViT-L/14 text encoder for the diffusion model, and VAE Decoder of 49.5M parameters. The model was trained on 595k steps at resolution of 512x512, which enables it to generate high quality images. We refer you to https://huggingface.co/benjamin-paine/stable-diffusion-v1-5 for more information. The benchmark runs 20 denoising steps for inference, and uses a precalculated time embedding of size 1x1280. Reference models can be found here https://github.com/mlcommons/mobile_open/releases
For latency benchmarking, we benchmark end to end, excluding the time embedding calculation and the tokenizer.
For accuracy calculations, the app adopts the CLIP metric for text-to-image consistency, and further evaluation of the generated images using this Image Quality Aesthetic Assessment metric https://github.com/idealo/image-quality-assessment/tree/master?tab=readme-ov-file

@anhappdev anhappdev force-pushed the anh/stable-diffusion-description branch from be59d3c to 063c086 Compare December 12, 2024 03:09
@anhappdev anhappdev force-pushed the anh/stable-diffusion-description branch from 063c086 to bc671d0 Compare December 12, 2024 03:38
@anhappdev anhappdev marked this pull request as ready for review December 17, 2024 02:24
@anhappdev anhappdev requested a review from a team as a code owner December 17, 2024 02:24
@anhappdev anhappdev merged commit aab2697 into submission-v4.1 Dec 17, 2024
22 checks passed
@anhappdev anhappdev deleted the anh/stable-diffusion-description branch December 17, 2024 07:00
@github-actions github-actions bot locked and limited conversation to collaborators Dec 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants