Training datasets #93

thibaudart · 2023-02-17T12:47:19Z

First of all, Thanks a lot for your work on this amazing tool!

Could you share the datasets used for training? With that we can make the training on sd2.1!

sepro · 2023-02-18T17:24:37Z

Thanks for these models! ControlNet results with my 1.5 models were awesome, but I have trained so many 2.1 embeddings I'd love to use with this.

hatlessman · 2023-02-19T19:01:53Z

I'm sure we could get together funds to train on some A100's, but the training data is the real problem. Is the data even able to be released? Legal issues?

thibaudart · 2023-02-19T19:31:35Z

Training seems fast and not really expensive.
i hope they ll be answer, the other solution is to generate our dataset. (100-300k images then use the script to get the scribble, openpose, depth… versions and train after… it ll work but it ll be more energy efficient if we have the data).
@hatlessman you can ping me on Twitter (@Thibaudz)

lllyasviel · 2023-02-19T19:51:52Z

Given the current complicated situation outside research community, we refrain from disclosing more details about data. Nevertheless, researchers may take a look at that dataset project everyone know.

thibaudart · 2023-02-19T20:55:23Z

Thanks @lllyasviel for your reply.

do you plan to train with sd2.1?

notrydo · 2023-02-20T17:18:44Z

If I can help with funds I'd be happy to help. I'm disappointed in current Open landscape.

…

On Sun, Feb 19, 2023, 12:55 PM thibaudart ***@***.***> wrote: Thanks @lllyasviel <https://github.com/lllyasviel> for your reply. do you plan to train with sd2.1? — Reply to this email directly, view it on GitHub <#93 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A567NIQTY2GYWNSFF5PMFNLWYKCEXANCNFSM6AAAAAAU7MPJBQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

thibaudart · 2023-02-20T18:51:53Z

@notrydo the first step is having a dataset for the training. If you have 100/300K of image (512x512) of quality and varied, it could be useful. If not, we will need to find a prompt dataset and generate them. (it takes roughly 24 hours to generate 40K images, so around 10 days to have the images, after that it'll take few days to BLIP them and to generate the preprocessed versions)

notrydo · 2023-02-20T22:58:47Z

Any additional specifications?

…

On Mon, Feb 20, 2023, 10:52 AM thibaudart ***@***.***> wrote: @notrydo <https://github.com/notrydo> the first step is having a dataset for the training. If you have 100/300K of image (512x512) of quality and varied, it could be useful. If not, we will need to find a prompt dataset and generate them. (it takes roughly 24 hours to generate 40K images, so around 10 days to have the images, after that it'll take few days to BLIP them and to generate the preprocessed versions) — Reply to this email directly, view it on GitHub <#93 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A567NIWTLOADIQWRJXORD3TWYO4NJANCNFSM6AAAAAAU7MPJBQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

notrydo · 2023-02-20T22:59:36Z

If known where we can purchase drop link.

…

On Mon, Feb 20, 2023, 2:58 PM Amanda Besemer ***@***.***> wrote: Any additional specifications? On Mon, Feb 20, 2023, 10:52 AM thibaudart ***@***.***> wrote: > @notrydo <https://github.com/notrydo> the first step is having a dataset > for the training. If you have 100/300K of image (512x512) of quality and > varied, it could be useful. If not, we will need to find a prompt dataset > and generate them. (it takes roughly 24 hours to generate 40K images, so > around 10 days to have the images, after that it'll take few days to BLIP > them and to generate the preprocessed versions) > > — > Reply to this email directly, view it on GitHub > <#93 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/A567NIWTLOADIQWRJXORD3TWYO4NJANCNFSM6AAAAAAU7MPJBQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

sALTaccount · 2023-02-20T23:37:56Z

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen:

lllyasviel · 2023-02-20T23:58:30Z

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen:

Anime models needs larger batchsize and lower (or disabling) text dropping because their tags are dense.
Also, because of sudden converge phenomenon, use 10* gradient accumulation to optimize 15k steps will be better than 150k steps.

lllyasviel · 2023-02-21T00:20:38Z

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen:

See also updated last section of https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md

thibaudart · 2023-02-21T00:22:44Z

Don’t know. Le lun. 20 févr. 2023 à 18:59, notrydo ***@***.***> a écrit :

If known where we can purchase drop link. On Mon, Feb 20, 2023, 2:58 PM Amanda Besemer ***@***.***> wrote: > Any additional specifications? > > On Mon, Feb 20, 2023, 10:52 AM thibaudart ***@***.***> > wrote: > >> @notrydo <https://github.com/notrydo> the first step is having a dataset >> for the training. If you have 100/300K of image (512x512) of quality and >> varied, it could be useful. If not, we will need to find a prompt dataset >> and generate them. (it takes roughly 24 hours to generate 40K images, so >> around 10 days to have the images, after that it'll take few days to BLIP >> them and to generate the preprocessed versions) >> >> — >> Reply to this email directly, view it on GitHub >> < #93 (comment) >, >> or unsubscribe >> < https://github.com/notifications/unsubscribe-auth/A567NIWTLOADIQWRJXORD3TWYO4NJANCNFSM6AAAAAAU7MPJBQ > >> . >> You are receiving this because you were mentioned.Message ID: >> ***@***.***> >> > — Reply to this email directly, view it on GitHub <#93 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AYWNXDBGXV62KHSC6UTZSJDWYPZOHANCNFSM6AAAAAAU7MPJBQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- T

thibaudart · 2023-02-21T00:23:59Z

@lllyasviel if we paid you for the training, could you do it for 2.1? Le lun. 20 févr. 2023 à 19:58, lllyasviel ***@***.***> a écrit :

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen: [image: image] <https://user-images.githubusercontent.com/18043686/220213337-21c349b1-c39b-4095-94df-f032ec3c3e0d.png> [image: image] <https://user-images.githubusercontent.com/18043686/220213345-61279016-3d6c-4220-8227-f013728b6004.png> [image: image] <https://user-images.githubusercontent.com/18043686/220213350-9cf593cc-d9fa-4777-92bc-3e70b0c0f909.png> [image: image] <https://user-images.githubusercontent.com/18043686/220213353-0780274f-2bdf-44fd-a3ff-3a76b6d8c0d8.png> Anime models needs larger batchsize and lower (or disabling) text dropping because their tags are dense. — Reply to this email directly, view it on GitHub <#93 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AYWNXDHBOBYL6TJ73GE3HKDWYQALDANCNFSM6AAAAAAU7MPJBQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- T

sALTaccount · 2023-02-21T00:48:51Z

Anime models needs larger batchsize and lower (or disabling) text dropping because their tags are dense. Also, because of sudden converge phenomenon, use 10* gradient accumulation to optimize 15k steps will be better than 150k steps.

@lllyasviel
Yeah I'm using as large of a batch size as I can on this machine, which is 1x A40. Going to be switching to 4x A40s soon though. I'm able to fit 18 batch size @ 512 resolution currently. I want to try 768 resolution for the next training run with the 4 GPUs, so I'm not sure what that will look like in terms of batch size.

I'll make the changes in the unconditional dropping, I might copy over the "partial dropout" code from waifu diffusion training, where we train with a variable percentage of the prompt (50% chance to have 0% to 100% of the tags, 50% chance to have 100%), except maybe moving the percentages to maybe have a 30% chance of partial dropout.

Very interesting about the sudden converge phenomenon. I've noticed this phenomenon with normal Waifu Diffusion 1.5 as well. I don't quite see how changing the gradient accumulation steps helps with this though, could you explain that part further?

Would love to talk about this more with you, is there a better way of contacting you (email, discord?)

lllyasviel · 2023-02-21T00:55:14Z

Anime models needs larger batchsize and lower (or disabling) text dropping because their tags are dense. Also, because of sudden converge phenomenon, use 10* gradient accumulation to optimize 15k steps will be better than 150k steps.

@lllyasviel Yeah I'm using as large of a batch size as I can on this machine, which is 1x A40. Going to be switching to 4x A40s soon though. I'm able to fit 18 batch size @ 512 resolution currently. I want to try 768 resolution for the next training run with the 4 GPUs, so I'm not sure what that will look like in terms of batch size.

I'll make the changes in the unconditional dropping, I might copy over the "partial dropout" code from waifu diffusion training, where we train with a variable percentage of the prompt (50% chance to have 0% to 100% of the tags, 50% chance to have 100%), except maybe moving the percentages to maybe have a 30% chance of partial dropout.

Very interesting about the sudden converge phenomenon. I've noticed this phenomenon with normal Waifu Diffusion 1.5 as well. I don't quite see how changing the gradient accumulation steps helps with this though, could you explain that part further?

Because that "sudden converge" always happens, lets say "sudden converge" will happen at 3k step and our money can optimize 90k step, then we have two options: (1) train 3k steps, sudden converge, then train 87k steps. (2) 30x gradient accumulation, train 3k steps (90k real computation steps), then sudden converge.

In my experiments, (2) is usually better than (1). However, in real cases, perhaps you may need to balance the steps before and after the "sudden converge" on your own to find a balance. The training after "sudden converge" is also important.

sALTaccount · 2023-02-21T01:08:00Z

@lllyasviel
I see. Just curious, do you think that it would make sense to try the same technique with a normal diffusion model? Would love to talk more about this, but I'm not sure is a github issue about training data is the best place lol. My discord is salt#1111 if we could talk there, although since this is research I'm not sure if there is some requirement where when you talk about it, it has to be in public. Maybe a new thread under the github discussions?

Just read your edit, do you mean that after the "sudden converge", I should reduce my gradient accumulation steps?

lllyasviel · 2023-02-21T01:13:30Z

no. The batch size should not be reduced under any circumstances. In addition, we should always remember that we are not training layers from scratch, we are optimizing some projections between existing layers. We are still fine tuning a SD. Any bad training that can fail SD fine tuning will fail controlnet training.
Feel free to open disscussion if necessary.

batrlatom · 2023-02-21T11:01:05Z

just for sake of reference ... is this the correct approach for grad_acc ?
N = 10
trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger], amp_backend='apex', accumulate_grad_batches=N)

?

ousinkou · 2023-02-28T09:19:09Z

When modifying batch size or gradient accumulation, shoud I modify learning rate?

offchan42 · 2023-03-13T06:47:54Z

@lllyasviel can you share hyperparameters you used for training e.g. batch size, effective batch size, number of GPUs, number of worker nodes, learning rate, number of training steps, etc?
I saw in the paper you mentioned learning rate as 1e-5 using AdamW optimizer but I'm not sure about the other hyperparameters. I'm especially interested in effective batch size because it affects the accuracy of the gradients.

By effective batch size, I refer to this value batch_size_per_gpu * n_GPUs_per_worker_node * n_worker_nodes * gradient_accumulation_steps

whydna · 2023-03-24T14:57:00Z

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen:

Can you share a bit more about what you mean by 50% unconditional chance?

offchan42 · 2023-03-24T15:02:19Z

@whydna it means that there is a 50% chance for the text prompt input to be dropped (set to empty string) when training the model and only the control image will be used (the sketch in this case). This forces the model to not rely too much on the text and try to generate the entire image just from the control image alone.

whydna · 2023-03-25T15:46:36Z

@off99555 thanks for the explanation - makes sense.

Is this achieved by just omitting prompts for 50% of the data set in prompts.json? Or is there some param to do it in the training function?

offchan42 · 2023-03-25T16:23:36Z

@off99555 thanks for the explanation - makes sense.

Is this achieved by just omitting prompts for 50% of the data set in prompts.json? Or is there some param to do it in the training function?

It should be done in the code dynamically. Here is example code I found in another repository that does 10% dropping of text prompt:
https://github.com/timothybrooks/instruct-pix2pix/blob/0dffd1eeb02611c35088462d1df88714ce2b52f4/stable_diffusion/ldm/models/diffusion/ddpm_edit.py#L701-L707

I'm not sure where this piece of code exists in ControlNet repo.

sALTaccount · 2023-03-28T03:57:03Z

The code doesn't exist in the controlnet repo, you have to write it yourself. Also, I talked to the author and he said that 50% is too high for sketch, it should be more like 0-10%

lilisierrayu · 2023-03-28T15:41:13Z

Given the current complicated situation outside research community, we refrain from disclosing more details about data. Nevertheless, researchers may take a look at that dataset project everyone know.

@lllyasviel could you please share the details of each feature extracters, such as the threshold used by canny(), mlsd() and midas ?

Luccadoremi · 2023-09-13T00:39:05Z

I am currently training a sketch to image model on Waifu Diffusion 1.5 (which uses SD 2.1 v prediction). I made a dataset of 1 million sketch image pairs, and I'm training with 50% unconditional chance (like in the paper). Here are the results so far at 150k samples seen:

See also updated last section of https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md

Could you share the hyperprameter you use? what is the learning rate and effective batch size?

lllyasviel closed this as completed Feb 19, 2023

lllyasviel reopened this Feb 21, 2023

This was referenced Sep 17, 2023

results become worse after 30 iterations lllyasviel/ControlNet-v1-1-nightly#100

Open

How to train without text prompt? #160

Open

train Non-Prompt Mode #246

Open

Training ControlNet, exclude text caption #422

Open

promptless training #506

Closed

This was referenced Dec 9, 2023

How to do promptless training for ControlNet? is there any script for that? #602

Open

The scribble model dataset #609

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training datasets #93

Training datasets #93

thibaudart commented Feb 17, 2023

sepro commented Feb 18, 2023

hatlessman commented Feb 19, 2023 •

edited

Loading

thibaudart commented Feb 19, 2023 •

edited

Loading

lllyasviel commented Feb 19, 2023

thibaudart commented Feb 19, 2023

notrydo commented Feb 20, 2023 via email

thibaudart commented Feb 20, 2023

notrydo commented Feb 20, 2023 via email

notrydo commented Feb 20, 2023 via email

sALTaccount commented Feb 20, 2023

lllyasviel commented Feb 20, 2023 •

edited

Loading

lllyasviel commented Feb 21, 2023

thibaudart commented Feb 21, 2023 via email

thibaudart commented Feb 21, 2023 via email

sALTaccount commented Feb 21, 2023 •

edited

Loading

lllyasviel commented Feb 21, 2023 •

edited

Loading

sALTaccount commented Feb 21, 2023 •

edited

Loading

lllyasviel commented Feb 21, 2023 •

edited

Loading

batrlatom commented Feb 21, 2023

ousinkou commented Feb 28, 2023

offchan42 commented Mar 13, 2023 •

edited

Loading

whydna commented Mar 24, 2023

offchan42 commented Mar 24, 2023 •

edited

Loading

whydna commented Mar 25, 2023

offchan42 commented Mar 25, 2023 •

edited

Loading

sALTaccount commented Mar 28, 2023

lilisierrayu commented Mar 28, 2023

Luccadoremi commented Sep 13, 2023

Training datasets #93

Training datasets #93

Comments

thibaudart commented Feb 17, 2023

sepro commented Feb 18, 2023

hatlessman commented Feb 19, 2023 • edited Loading

thibaudart commented Feb 19, 2023 • edited Loading

lllyasviel commented Feb 19, 2023

thibaudart commented Feb 19, 2023

notrydo commented Feb 20, 2023 via email

thibaudart commented Feb 20, 2023

notrydo commented Feb 20, 2023 via email

notrydo commented Feb 20, 2023 via email

sALTaccount commented Feb 20, 2023

lllyasviel commented Feb 20, 2023 • edited Loading

lllyasviel commented Feb 21, 2023

thibaudart commented Feb 21, 2023 via email

thibaudart commented Feb 21, 2023 via email

sALTaccount commented Feb 21, 2023 • edited Loading

lllyasviel commented Feb 21, 2023 • edited Loading

sALTaccount commented Feb 21, 2023 • edited Loading

lllyasviel commented Feb 21, 2023 • edited Loading

batrlatom commented Feb 21, 2023

ousinkou commented Feb 28, 2023

offchan42 commented Mar 13, 2023 • edited Loading

whydna commented Mar 24, 2023

offchan42 commented Mar 24, 2023 • edited Loading

whydna commented Mar 25, 2023

offchan42 commented Mar 25, 2023 • edited Loading

sALTaccount commented Mar 28, 2023

lilisierrayu commented Mar 28, 2023

Luccadoremi commented Sep 13, 2023

hatlessman commented Feb 19, 2023 •

edited

Loading

thibaudart commented Feb 19, 2023 •

edited

Loading

lllyasviel commented Feb 20, 2023 •

edited

Loading

sALTaccount commented Feb 21, 2023 •

edited

Loading

lllyasviel commented Feb 21, 2023 •

edited

Loading

sALTaccount commented Feb 21, 2023 •

edited

Loading

lllyasviel commented Feb 21, 2023 •

edited

Loading

offchan42 commented Mar 13, 2023 •

edited

Loading

offchan42 commented Mar 24, 2023 •

edited

Loading

offchan42 commented Mar 25, 2023 •

edited

Loading