Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training datasets #93

Open
thibaudart opened this issue Feb 17, 2023 · 28 comments
Open

Training datasets #93

thibaudart opened this issue Feb 17, 2023 · 28 comments

Comments

@thibaudart
Copy link

First of all, Thanks a lot for your work on this amazing tool!

Could you share the datasets used for training? With that we can make the training on sd2.1!

@sepro
Copy link

sepro commented Feb 18, 2023

Thanks for these models! ControlNet results with my 1.5 models were awesome, but I have trained so many 2.1 embeddings I'd love to use with this.

@hatlessman
Copy link

hatlessman commented Feb 19, 2023

I'm sure we could get together funds to train on some A100's, but the training data is the real problem. Is the data even able to be released? Legal issues?

@thibaudart
Copy link
Author

thibaudart commented Feb 19, 2023

Training seems fast and not really expensive.
i hope they ll be answer, the other solution is to generate our dataset. (100-300k images then use the script to get the scribble, openpose, depth… versions and train after… it ll work but it ll be more energy efficient if we have the data).
@hatlessman you can ping me on Twitter (@Thibaudz)

@lllyasviel
Copy link
Owner

Given the current complicated situation outside research community, we refrain from disclosing more details about data. Nevertheless, researchers may take a look at that dataset project everyone know.

@thibaudart
Copy link
Author

Thanks @lllyasviel for your reply.

do you plan to train with sd2.1?

@notrydo
Copy link

notrydo commented Feb 20, 2023 via email

@thibaudart
Copy link
Author

@notrydo the first step is having a dataset for the training. If you have 100/300K of image (512x512) of quality and varied, it could be useful. If not, we will need to find a prompt dataset and generate them. (it takes roughly 24 hours to generate 40K images, so around 10 days to have the images, after that it'll take few days to BLIP them and to generate the preprocessed versions)

@notrydo
Copy link

notrydo commented Feb 20, 2023 via email

@notrydo
Copy link

notrydo commented Feb 20, 2023 via email

@sALTaccount
Copy link

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen:
image
image
image
image

@lllyasviel
Copy link
Owner

lllyasviel commented Feb 20, 2023

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen: image image image image

Anime models needs larger batchsize and lower (or disabling) text dropping because their tags are dense.
Also, because of sudden converge phenomenon, use 10* gradient accumulation to optimize 15k steps will be better than 150k steps.

@lllyasviel
Copy link
Owner

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen: image image image image

See also updated last section of https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md

@lllyasviel lllyasviel reopened this Feb 21, 2023
@thibaudart
Copy link
Author

thibaudart commented Feb 21, 2023 via email

@thibaudart
Copy link
Author

thibaudart commented Feb 21, 2023 via email

@sALTaccount
Copy link

sALTaccount commented Feb 21, 2023

Anime models needs larger batchsize and lower (or disabling) text dropping because their tags are dense. Also, because of sudden converge phenomenon, use 10* gradient accumulation to optimize 15k steps will be better than 150k steps.

@lllyasviel
Yeah I'm using as large of a batch size as I can on this machine, which is 1x A40. Going to be switching to 4x A40s soon though. I'm able to fit 18 batch size @ 512 resolution currently. I want to try 768 resolution for the next training run with the 4 GPUs, so I'm not sure what that will look like in terms of batch size.

I'll make the changes in the unconditional dropping, I might copy over the "partial dropout" code from waifu diffusion training, where we train with a variable percentage of the prompt (50% chance to have 0% to 100% of the tags, 50% chance to have 100%), except maybe moving the percentages to maybe have a 30% chance of partial dropout.

Very interesting about the sudden converge phenomenon. I've noticed this phenomenon with normal Waifu Diffusion 1.5 as well. I don't quite see how changing the gradient accumulation steps helps with this though, could you explain that part further?

Would love to talk about this more with you, is there a better way of contacting you (email, discord?)

@lllyasviel
Copy link
Owner

lllyasviel commented Feb 21, 2023

Anime models needs larger batchsize and lower (or disabling) text dropping because their tags are dense. Also, because of sudden converge phenomenon, use 10* gradient accumulation to optimize 15k steps will be better than 150k steps.

@lllyasviel Yeah I'm using as large of a batch size as I can on this machine, which is 1x A40. Going to be switching to 4x A40s soon though. I'm able to fit 18 batch size @ 512 resolution currently. I want to try 768 resolution for the next training run with the 4 GPUs, so I'm not sure what that will look like in terms of batch size.

I'll make the changes in the unconditional dropping, I might copy over the "partial dropout" code from waifu diffusion training, where we train with a variable percentage of the prompt (50% chance to have 0% to 100% of the tags, 50% chance to have 100%), except maybe moving the percentages to maybe have a 30% chance of partial dropout.

Very interesting about the sudden converge phenomenon. I've noticed this phenomenon with normal Waifu Diffusion 1.5 as well. I don't quite see how changing the gradient accumulation steps helps with this though, could you explain that part further?

Because that "sudden converge" always happens, lets say "sudden converge" will happen at 3k step and our money can optimize 90k step, then we have two options: (1) train 3k steps, sudden converge, then train 87k steps. (2) 30x gradient accumulation, train 3k steps (90k real computation steps), then sudden converge.

In my experiments, (2) is usually better than (1). However, in real cases, perhaps you may need to balance the steps before and after the "sudden converge" on your own to find a balance. The training after "sudden converge" is also important.

@sALTaccount
Copy link

sALTaccount commented Feb 21, 2023

@lllyasviel
I see. Just curious, do you think that it would make sense to try the same technique with a normal diffusion model? Would love to talk more about this, but I'm not sure is a github issue about training data is the best place lol. My discord is salt#1111 if we could talk there, although since this is research I'm not sure if there is some requirement where when you talk about it, it has to be in public. Maybe a new thread under the github discussions?

Just read your edit, do you mean that after the "sudden converge", I should reduce my gradient accumulation steps?

@lllyasviel
Copy link
Owner

lllyasviel commented Feb 21, 2023

no. The batch size should not be reduced under any circumstances. In addition, we should always remember that we are not training layers from scratch, we are optimizing some projections between existing layers. We are still fine tuning a SD. Any bad training that can fail SD fine tuning will fail controlnet training.
Feel free to open disscussion if necessary.

@batrlatom
Copy link

just for sake of reference ... is this the correct approach for grad_acc ?
N = 10
trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger], amp_backend='apex', accumulate_grad_batches=N)

?

@ousinkou
Copy link

When modifying batch size or gradient accumulation, shoud I modify learning rate?

@offchan42
Copy link

offchan42 commented Mar 13, 2023

@lllyasviel can you share hyperparameters you used for training e.g. batch size, effective batch size, number of GPUs, number of worker nodes, learning rate, number of training steps, etc?
I saw in the paper you mentioned learning rate as 1e-5 using AdamW optimizer but I'm not sure about the other hyperparameters. I'm especially interested in effective batch size because it affects the accuracy of the gradients.

By effective batch size, I refer to this value batch_size_per_gpu * n_GPUs_per_worker_node * n_worker_nodes * gradient_accumulation_steps

@whydna
Copy link

whydna commented Mar 24, 2023

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen:
image
image
image
image

Can you share a bit more about what you mean by 50% unconditional chance?

@offchan42
Copy link

offchan42 commented Mar 24, 2023

@whydna it means that there is a 50% chance for the text prompt input to be dropped (set to empty string) when training the model and only the control image will be used (the sketch in this case). This forces the model to not rely too much on the text and try to generate the entire image just from the control image alone.
image

@whydna
Copy link

whydna commented Mar 25, 2023

@off99555 thanks for the explanation - makes sense.

Is this achieved by just omitting prompts for 50% of the data set in prompts.json? Or is there some param to do it in the training function?

@offchan42
Copy link

offchan42 commented Mar 25, 2023

@off99555 thanks for the explanation - makes sense.

Is this achieved by just omitting prompts for 50% of the data set in prompts.json? Or is there some param to do it in the training function?

It should be done in the code dynamically. Here is example code I found in another repository that does 10% dropping of text prompt:
https://github.com/timothybrooks/instruct-pix2pix/blob/0dffd1eeb02611c35088462d1df88714ce2b52f4/stable_diffusion/ldm/models/diffusion/ddpm_edit.py#L701-L707

I'm not sure where this piece of code exists in ControlNet repo.

@sALTaccount
Copy link

The code doesn't exist in the controlnet repo, you have to write it yourself. Also, I talked to the author and he said that 50% is too high for sketch, it should be more like 0-10%

@lilisierrayu
Copy link

Given the current complicated situation outside research community, we refrain from disclosing more details about data. Nevertheless, researchers may take a look at that dataset project everyone know.

@lllyasviel could you please share the details of each feature extracters, such as the threshold used by canny(), mlsd() and midas ?

@Luccadoremi
Copy link

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen: image image image image

See also updated last section of https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md

Could you share the hyperprameter you use? what is the learning rate and effective batch size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests