Skip to content

An open source, layer-based web interface for Collage Diffusion - use a familiar Photoshop-like interface and let the AI harmonize the details.

Notifications You must be signed in to change notification settings

linden-li/collage-diffusion-ui

Repository files navigation

Collage Diffusion UI

[Demo Website] [Blog Post] [Video Tutorial] [Paper]

Collage Diffusion web UI

Collage Diffusion is a novel interface for interacting with image generation models. It allows you to specify the composition of an image in a familiar Photoshop-like interface. Our modified version of Stable Diffusion takes the layers in and produces a harmonized image, ensuring that everything from perspectives to lighting are plausible. Unlike text prompting supported by traditional diffusion interfaces, Layered Diffusion allows you to precisely outline how a scene should be composed—from where objects are relative to each other to what they look like.

The frontend is a React app written in Typescript using the Chakra UI library. The server implements a custom scheduler runtime that dispatches requests to the Ray Serve library for inference. The model is a modified version of Stable Diffusion via HuggingFace Diffusers.

Development Setup

Create a configuration file by running

./configure.sh config_dev.json

Frontend

The frontend is a React App written in Typescript, with UI components from Chakra UI. All code is placed in the frontend directory. To setup locally, make sure you've installed node and npm. Setup dependencies by running the following commands:

cd frontend
npm install

The app is built using vite. To start the app, run:

npm start

and navigate to http://localhost:5173. If you want to deploy, run

./start.sh frontend

which will build the app and serve it on port 3000.

Server

The server hosts a modified version of Stable Diffusion v1.5 from the HuggingFace diffusers library. Inference is best run on a node with GPUs. The model weights take approximately 8 GB of GPU VRAM, so most GPUs (NVIDIA A100, A10G, V100, etc.) should be able to handle the workload without running out of memory.

We provide a configuration file in configs/config_gcp.json that allows you to configure the ports that the app is run on. A crucial field is backend.activeGpus, which adjusts the CUDA_VISIBLE_DEVICES environment variable within the application container.

We provide a Dockerfile containing all of the dependencies to run the server. To build the image, run

docker build -t collage-diffusion .

from the project root directory. This will create an image called layered-diffusion, which you can use to run the server by running:

docker run -d --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p 8009:8009 -p 9007:9007 collage-diffusion

If you modified the ports in the config file above, then be sure to adjust the forwarded port accordingly.

Setting up Google Cloud

Our service uploads to a Google Cloud Bucket. To use your own custom bucket, navigate to backend/utils/gcloud_utils.py and modify the PROJECT_ID and BUCKET_NAME variables accordingly.

Install the Google Cloud storage package:

pip install google-cloud-storage

After that, install the Google Cloud SDK. You can do this by running the following command:

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-417.0.0-linux-x86_64.tar.gz
tar -xf google-cloud-cli-417.0.0-linux-x86_64.tar.gz

Install via:

./google-cloud-sdk/install.sh

After installation, restart your shell. Then, run the following commands:

./google-cloud-sdk/bin/gcloud init

which will prompt you to log in to your Google account. Select your project.

You will then have to login using:

gcloud auth application-default login

To test that uploading to GCP works, run the following command:

python backend/utils/gcloud_utils.py -f {path_to_file}

where {path_to_file} is a path to a file you want to upload. If the file successfully uploaded, you should see a link to the file in the console.

Collage Diffusion Implementation

The Collage Diffusion implementation modifies Stable Diffusion off of a fork of diffusers. The file backend/pipeline_controlnet.py contains a diffusers pipeline where the inputs to Collage Diffusion can be easily configured. For an example on how to use the pipeline, see backend/test_controlnet.py.

Acknowledgements

This project was done under the supervision of Prof. Chris Re and Prof. Kayvon Fatahalian. The implementation and design of the system was done by Vishnu Sarukkai, Arden Ma, and Linden Li.