This project was motivated by a desire to explore the limits of the latest AI technologies, particularly the Flux models and the Stable Diffusion img2vid model. By finetuning the Flux.1-dev model on images of the ZenML co-founder using DreamBooth, we aimed to showcase the potential for personalized AI applications.
DreamBooth is a technique that allows for the creation of custom text-to-image models by finetuning a pre-trained model on a small dataset of images featuring a specific subject. In this case, we used a dataset of cropped portrait photos of the ZenML co-founder to create a personalized model capable of generating novel images of him in various styles and contexts.
To make the finetuning process more efficient, we employed LoRA (Low-Rank Adaptation) adapters. LoRA allows for faster finetuning by only updating a small fraction of the model's weights, which can then be saved separately from the original model. The resulting LoRA adapters for this project have been pushed to the Hugging Face repository at https://huggingface.co/strickvl/flux-dreambooth-hamza.
While we finetuned the Flux.1-dev model for training, we switched to the Flux-Schnell model for inference. Flux-Schnell is a faster variant of the Flux model that maintains comparable quality, making it an excellent choice for efficient batch inference. The LoRA adapters worked seamlessly with Flux-Schnell, allowing us to generate high-quality personalized images quickly.
Crafting effective prompts was a crucial aspect of this project. We experimented with various prompts to find those that worked best with the finetuned model. Additionally, we had to tailor the prompts to the limitations of our dataset, which consisted primarily of cropped portrait photos. By carefully designing prompts that played to the strengths of the available data, we were able to generate impressive personalized images and short animated videos.
The pipeline outputs a grouped image showcasing the results of various prompts and generates short 3-second animated videos using the stable-video-diffusion-img2vid-xt img2vid model. These outputs demonstrate the potential for creating engaging, personalized content using state-of-the-art AI technologies.
To get started with this project, follow these steps:
- Clone the ZenML Projects repository:
git clone https://github.com/zenml-io/zenml-projects.git
- Navigate to the
flux-dreambooth
directory:cd zenml-projects/flux-dreambooth
- Install the required dependencies:
pip install -r requirements.txt
- For a guided walkthrough of the code, check out the
walkthrough.ipynb
notebook.
You'll need a cloud stack to run these pipelines, and you'll need GPU quotas so as to be able to run the training and batch inference steps. You can get started with a 1-click deployment on all the major cloud providers using our simple guide.
To run the pipeline, use one of the following commands depending on your stack:
- For Kubernetes:
python k8s_run.py
- For Modal:
python modal_run.py
The project contains the following files with comments:
configs/k8s_run_refactored_multi_video.yaml
: Configuration files used in the notebooksk8s_run.py
: Script to run the pipeline on a Kubernetes stackmodal_run.py
: Script to run the pipeline on a Modal stacktest_examples_utils.py
: Related to the diffusers script used in older versions of the pipelinetrain_dreambooth_lora_flux.py
: From the diffusers library, used in older versions of the pipelinetrain_dreambooth.py
: From the diffusers library, used in older versions of the pipelinewalkthrough.ipynb
: Jupyter Notebook providing a guided walkthrough of the code
Contributions to this project are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.