fix import and make npz for alpha_mask #1632

Maru-mee · 2024-09-22T04:49:42Z

本件は、alpha_maskに関する潜在的な問題の修正です。

現行コードの問題点

従来、npz内のalpha_maskはbucket_sizeに応じてリサイズされます。
cache_latentsは8の倍数に丸められるが、
npz内のalpha_maskは8の倍数になっていませんでした。
その結果、画像によっては、training stepループ中にtensor sizeの不一致が発生し、スタックします。

for example：alpha_mask=(H:W)=(996:896), required_size = (992:896)
Error message：

  File "C:\AI_illust\sd-scripts\sd-scripts\library\train_util.py", line 1394, in __getitem__
    example["alpha_masks"] = torch.stack(alpha_mask_list)
RuntimeError: stack expects each tensor to be equal size, but got [992, 896] at entry 0 and [996, 896] at entry 1

発生条件

・alpha_maskを使用
・cache_to_diskを使用
・imageのaspect ratioが1:1でない、かつ特定のサイズ
・SDXL （おそらくモデルの種類は関係ない）

変更点

stackしないように、npzのalpha_maskを８で割り切れる値に切り捨てた。
alpha_mask = image[:image.shape[0] // 8 * 8, :image.shape[1] // 8 * 8, 3]

詳細は下記のコメントをご覧ください。

Maru-mee · 2024-09-22T05:01:43Z

library/train_util.py

@@ -2514,6 +2518,7 @@ def cache_batch_latents(
            if image.shape[2] == 4:
                alpha_mask = image[:, :, 3]  # [H,W]
                alpha_mask = alpha_mask.astype(np.float32) / 255.0
+                alpha_mask = alpha_mask[:image.shape[0] // 8 * 8, :image.shape[1] // 8 * 8]  # Without rounding down [H, W], Tensor sizes may not match, so stack the tensors.


この変更では、alpha_maskの解像度を8の倍数にround downします。
これによって、画像サイズによっては、training stepループ中にtensor sizeの不一致を解消して、スタックしないようにできます。
結果として、保存されるnpzのalpha_maskサイズは8の倍数になります。

Maru-mee · 2024-09-22T05:06:01Z

library/train_util.py

@@ -2207,7 +2207,11 @@ def is_disk_cached_latents_is_expected(reso, npz_path: str, flip_aug: bool, alph
        if alpha_mask:
            if "alpha_mask" not in npz:
                return False
-            if (npz["alpha_mask"].shape[1], npz["alpha_mask"].shape[0]) != reso:  # HxW => WxH != reso


checking validityにおいて、
前述の変更により、alpha_maskが8の倍数になるため、このチェック判定式は使えません。
そのため、後述の変更を加えました。

Maru-mee · 2024-09-22T05:09:59Z

library/train_util.py

-            if (npz["alpha_mask"].shape[1], npz["alpha_mask"].shape[0]) != reso:  # HxW => WxH != reso
+            alpha_mask_size = npz["alpha_mask"].shape[0:2]
+            if alpha_mask_size[0] != alpha_mask_size[0] // 8 * 8 or alpha_mask_size[1] != alpha_mask_size[1] // 8 * 8: # ...is legacy caching scheme without rounding to divisible by 8
+                return False


まず、npzチェック行程において、後方互換性を確保するため、
まず、8でround downしていない旧方式で作成したnpzに出会った場合、
npzを再作成させます。

この変更を行わずに、古いキャッシュを削除するよう口頭で周知する方法もありますが、
おそらく伝わりにくい情報だと思いますので、自動判定させるようにしました。
将来的には削除した方が良いかもしれません。
（もし想定外の不具合報告が発生した場合は削除しましょう）

Maru-mee · 2024-09-22T05:16:17Z

library/train_util.py

+            if alpha_mask_size[0] != alpha_mask_size[0] // 8 * 8 or alpha_mask_size[1] != alpha_mask_size[1] // 8 * 8: # ...is legacy caching scheme without rounding to divisible by 8
+                return False
+            alpha_mask_size = (alpha_mask_size[0] // 8, alpha_mask_size[1] // 8) # Resize alpha_mask to 1/8 scale, the same as latents
+            if alpha_mask_size != expected_latents_size:  # HxW


この変更では、新しいalpha_mask_sizeの判定式と比較するように補正しました。
また、この前後の行と同じように、resoではなくexpected_latents_sizeと比較するようにして、コードの視認性を向上しました。
※機能的にはresoでもexpected_latents_sizeでも同一

sdbds · 2024-09-22T06:36:11Z

Here's a thought about one other issue, could we consider using the vector DB format to store cached latent instead of npz?
The npz format wraps multiple different types of npy's and runs into all sorts of problems in the future.
And it's also slower to load, if we use the new format to store it we can load it directly from disk instead of loading it into memory first.
It was recommended to me that I was using lance as a format, and I think it has a lot of potential.

kohya-ss · 2024-09-22T13:48:40Z

Here's a thought about one other issue, could we consider using the vector DB format to store cached latent instead of npz?
The npz format wraps multiple different types of npy's and runs into all sorts of problems in the future.
And it's also slower to load, if we use the new format to store it we can load it directly from disk instead of loading it into memory first.
It was recommended to me that I was using lance as a format, and I think it has a lot of potential.

Hmm, I don't think using lance is a better way.

Because we are storing latent data, we can expect very little compression of the data. Also, since access is direct by file name, no vector search is required.

Saving to .npz may not be the best solution, but I think it's simple and sufficient.

kohya-ss · 2024-09-22T14:00:10Z

pull requestを作成していただき、ありがとうございます。以下の部分で.npzファイルに保存されるalpha_maskのサイズは必ずlatentと同じサイズ（つまり8の倍数）になっていると理解していますが、そうでない場合が起きえる、ということですね。

sd-scripts/library/train_util.py

Lines 2508 to 2521 in 0b7927e

    
           image, original_size, crop_ltrb = trim_and_resize_if_required(random_crop, image, info.bucket_reso, info.resized_size) 
        
           info.latents_original_size = original_size 
        
           info.latents_crop_ltrb = crop_ltrb 
        
           if use_alpha_mask: 
        
               if image.shape[2] == 4: 
        
                   alpha_mask = image[:, :, 3]  # [H,W] 
        
                   alpha_mask = alpha_mask.astype(np.float32) / 255.0 
        
                   alpha_mask = torch.FloatTensor(alpha_mask)  # [H,W] 
        
               else: 
        
                   alpha_mask = torch.ones_like(image[:, :, 0], dtype=torch.float32)  # [H,W] 
        
           else: 
        
               alpha_mask = None

なお、latentsのキャッシュ回りはsd3ブランチで大きく変えていますので、修正いただいたコードをマージできない（sd3ブランチでは削除されている）可能性もありますのでご了承ください。

Maru-mee · 2024-09-22T14:44:32Z

なお、latentsのキャッシュ回りはsd3ブランチで大きく変えていますので、修正いただいたコードをマージできない（sd3ブランチでは削除されている）可能性もありますのでご了承ください。

承知しました。
PR(#1619) の私のコメントの確認もお願い致します。
今のdev版のtrain_util.py,util.pyではほぼ確定でalpha_maskを使えない状態ですので。

一方で、
どうやら、PR#1619を適用するとこのPRなしでも解決するかもしれないという状況になっています（偶然や単に検証不足の可能性もあります）。PR登録後に初めて気づきました。原因が特定できるまでは、このPRはOpenにしない予定です。

Maru-mee · 2024-09-24T13:49:48Z

確認が完了しました。
改めて、上記の内容でコードを提案させていただきます。
sd3ブランチについては未検証ですので、問題発生中のdevにだけマージをお願い致します。

kohya-ss · 2024-09-25T11:26:49Z

こちらの不具合をマージ前に再現しようとしたのですが、どうも再現できませんでした。以下のようなdataset設定の.tomlを作ることで再現できると考えたのですが、認識相違ありませんでしょうか。

alpha付き画像となし画像で、微妙にサイズが異なるが同じbucketに入る画像を用意する
それらを別のsubsetにする
batch sizeを1より大きくする

もし再現条件がお分かりになりましたらお教えください。よろしくお願いいたします。

Maru-mee · 2024-09-25T13:38:51Z

その条件に加えて、
・subsetsごとに、alpha_mask=true、alpha_mask=falseでそれぞれnpzを作成しておく（同時は不可）
・bucket_no_upscaleはfalseとする
これで、問題となる画像サイズに遭遇しやすくなると思います。

参考として、検証手順を下記に整理しました。

１．画像の準備
下記２つのsubsetをご準備ください。

datasets.subsets_1：alpha_maskを適用する画像群：１フォルダ以上
（１）RGBAの４チャンネルのpng
（２）アスペクト比が1:1でないもの
　※具体的にどのサイズが問題となるかは未調査ですので、
　様々なアスペクト比の画像をご用意ください。
　画像を20~50枚準備すれば、体感5割以上の確率で遭遇できます。

datasets.subsets_2：alpha_maskを適用しない画像群：１フォルダ以上

２．npzを作成

通常の学習と同じように、npzを作成する。
この時点ではtraining roopは実行不要です。
（この操作は再現性に直結するわけではないですが、念の為）

ただし、dataset_subsets１、２は、それぞれ個別にsdxl_train.pyを実行して作成する。
一度に両方のdataset_subsetsのnpzを作ろうとすると、npz内にalpha_maskが作成されない場合があります
（この挙動は以前から存在し、今回のPRでは変更していない部分です）。

３，dataset_subsets1、２両方でtraining roop実行
ある特定のstepでエラーが発生します。

bucket_no_upscaleはOFFにしてください。
その結果、alpha_maskのHW寸法が元画像から、設定したresolutionに応じて様々に変化するせいか、
問題となるtensorサイズになりやすく、エラー遭遇確率が上がっている気がします。

４．その他の設定
おそらく関係ありませんが、参考

・SDXL、OFT
・batch_size = 6
・--resolution=960
・--min_bucket_reso=100
・--max_bucket_reso=4000（この値がbucket_sizeの上限値になりさえしねければ任意）。

kohya-ss · 2024-09-25T14:12:50Z

詳細にありがとうございます。現状、.tomlのdatasetのalpha_mask設定がすべて同一になる不具合があるため、あらかじめ分けて.npzを生成しておかないと事象が発生しない、ということですね。

恐らくですが、同じbucketに割り当てられ、かつサイズが異なる画像が、それぞれalpha_maskありとなしの両方のsubsetに存在する、というのが条件になりそうです。

手順に従い、検証してみます。

kohya-ss · 2024-09-26T12:18:23Z

一度に両方のdataset_subsetsのnpzを作ろうとすると、npz内にalpha_maskが作成されない場合があります
（この挙動は以前から存在し、今回のPRでは変更していない部分です）。

不具合を再現するためにも、こちら不具合の方に先に対応したほうが良さそうでしたので、サブセット間でalpha_maskの値が異なっても.npzが正しく作成されるよう修正しました。

これにより.npzがalpha_mask有り無しとも正しく生成されるようになりましたが、.npz内のalpha_maskが8で割り切れる数になっており、依然としてこの不具合が発生しないようです。

お手数ですが、最新のdevブランチでこちらの不具合が発生するか、ご確認いただくことは可能でしょうか。

Maru-mee · 2024-09-26T15:18:21Z

サブセット間でalpha_maskの値が異なっても.npzが正しく作成されるよう修正しました。

ありがとうございます。　
現状最新のリモートdevへローカルdevすべてを更新した結果、
alpha_maskが正常に作成できるようになりました。
※train_util.pyは(a94bc84)

一方で、8で割り切れないalpha_maskは依然として発生します。
RuntimeError: stack expects each tensor to be equal size, but got [1124, 768] at entry 0 and [1120, 768] at entry 2となりました。

下記は問題となる画像のnpzの中身を出力したものです。

latents: shape=(4, 140, 96), dtype=float32
original_size: shape=(2,), dtype=int32
crop_ltrb: shape=(4,), dtype=float64
alpha_mask: shape=(1124, 768), dtype=float32
latents shape[1:3]: (140, 96)

元の画像サイズは,H1461 ✕W1000で、
resolution=960、bucket_upscaleと、bucket_reso_steps: 64、それ以外の何かの処理によって
alpha_maskはH1124,W768となりました。

それ以外の画像についても、8の倍数ではないが、４の倍数であることがほとんどです。

Maru-mee · 2024-09-26T15:20:47Z

どの画像が問題かがようやくわかったので、
弊方でも可能な限り原因を追ってみようと思います。
わかったことを下記に追記していきます。

bucket_size’(reso)が、alpha_mask値に波及しているらしく、

def select_bucket　ー　train_util.pyの239行目

logger.info(f"use predef, {image_width}, {image_height}, {reso}, {resized_size}")

を出力すると、下記でした。
use predef, 766, 1080, (768, 1124), (797, 1124)

Maru-mee · 2024-09-27T03:15:29Z

(768,1124)というbucketが発生した原因は、
min_sizeがreso_step及び64で割り切れない値を指定した事が問題かもしれません。
min_size=256にするとスタックエラーが発生しなくなりました。

min_size=100
max_size=1500
reso_steps=default(=64)
enable_bucket=true
bucket_no_upscale=false
の条件において、

train_util.pyの217行目
resos = model_util.make_bucket_resolutions(…)　の計算結果は下記となり、1124の存在が確認できます。
[(100, 1500), (164, 1500), (228, 1500), (292, 1500), (356, 1500),…, (768, 1124), (768, 1188), … (1060, 832), (1088, 804), (1124, 768),… (1500, 484), (1500, 548)]

make_bucket_resolutions関数において、
resos
= min_sizes + reso_steps * n , n=1,2,3,…
= 100+ (64 * n)
と計算されている事が真因のようです。

この状態でアスペクト比1:1でない画像を使用してしまった結果、bucket list（上記）の中から8の倍数でないbucketが選択（有効化）され、それを参照したalpha_maskがstackとして顕在化してしまったようです。

model_util.pyにおけるmin_size, height, widthのいずれかを、8,64,reso_steps ( = divisible )のいずれかに丸める処理を追加するのが良いと考えていますが、いかがでしょうか。
max_sizeについても同様のことが言えます。

kohya-ss · 2024-09-27T12:35:05Z

調査いただき、ありがとうございます。min_bucket_reso/max_bucket_resoにbucket_reso_stepsで割り切れない値を設定されることは、想定外でした。

bucket_reso_stepsを確認するverify_bucket_reso_stepsメソッドがありますので、メソッド名を変更して、min/max_bucket_resoについても確認しbucket_reso_steps単位に丸める処理を入れるのが良さそうです。

sd-scripts/library/train_util.py

Lines 2181 to 2183 in ce49ced

    
           def verify_bucket_reso_steps(self, min_steps: int): 
        
               for dataset in self.datasets: 
        
                   dataset.verify_bucket_reso_steps(min_steps)

sd-scripts/library/train_util.py

Lines 970 to 974 in ce49ced

    
           def verify_bucket_reso_steps(self, min_steps: int): 
        
               assert self.bucket_reso_steps is None or self.bucket_reso_steps % min_steps == 0, ( 
        
                   f"bucket_reso_steps is {self.bucket_reso_steps}. it must be divisible by {min_steps}.\n" 
        
                   + f"bucket_reso_stepsが{self.bucket_reso_steps}です。{min_steps}で割り切れる必要があります" 
        
               )

また一部の学習スクリプトが、このメソッドを呼んでいない潜在バグがあるので、そちらの追加も必要そうです。

Maru-mee · 2024-09-27T15:21:15Z

それが最善だと思います。

ドキュメントやtrain_utilのパラメータ解説は十分に読んだつもりだったのですが、min_bucket_resoの設定ルールを理解しないまま適当に設定していました。私の使い方の方にも問題があったように思います。
お忙しい所恐縮ですが、今後の想定外のissueを予防するために、もし推奨設定等あれば、下記の3841行の文章に一言アドバイスを追記して頂けると助かります。

追加イメージ：
（実際にはbucket_reso_stepsにより桁上げor桁下げされる仕様ですので、）
使いたいbucket_sizeよりも大きめ or 小さめの値をを指定してください。〇〇で割り切れる値を推奨します。

3841    parser.add_argument("--min_bucket_reso", type=int, default=256, help="minimum resolution for buckets / bucketの最小解像度")
3842    parser.add_argument("--max_bucket_reso", type=int, default=1024, help="maximum resolution for buckets / bucketの最大解像度")

kohya-ss · 2024-09-28T07:38:44Z

こちらこそ、恐らくどこにも書いていないと思いますので申し訳ありません。helpテキストにも追加が必要ですね。

よろしければこちらのPRはcloseさせていただき、min/max_bucket_resoをbucket_reso_steps単位に丸める処理その他を私の方で追加しようかと思いますが、いかがでしょうか。

Maru-mee · 2024-09-28T08:03:31Z

OKです。
お手数をお掛けすることになってすみません。

kohya-ss · 2024-09-29T00:54:41Z

min/max_bucket_resoをbucket_reso_steps単位に丸める処理を追加し、ヘルプのテキスト等は割り切れる値を指定するよう修正しました（実際には警告を出しつつ動きますが）。

verify_bucket_reso_stepsはbucketの作成後の検証だったため、DataSetクラスのコンストラクタ内でチェックと丸めを行うようにいたしました。

不具合などありましたらご連絡ください。問題提起いただき、本当にありがとうございました。

fix import and make npz for alpha_mask

a9ecad5

Maru-mee commented Sep 22, 2024

View reviewed changes

Maru-mee mentioned this pull request Sep 22, 2024

Retain alpha in pil_resize for --alpha_mask #1619

Merged

Maru-mee marked this pull request as ready for review September 24, 2024 13:49

Maru-mee closed this Sep 26, 2024

Maru-mee reopened this Sep 26, 2024

kohya-ss added a commit that referenced this pull request Sep 29, 2024

adjust min/max bucket reso divisible by reso steps #1632

fe2aa32

kohya-ss closed this Sep 29, 2024

kohya-ss added a commit that referenced this pull request Sep 29, 2024

update help text #1632

1567549

kohya-ss mentioned this pull request Jan 17, 2025

merge dev to main #1879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix import and make npz for alpha_mask #1632

fix import and make npz for alpha_mask #1632

Maru-mee commented Sep 22, 2024 •

edited

Loading

Maru-mee Sep 22, 2024

Maru-mee Sep 22, 2024

Maru-mee Sep 22, 2024

Maru-mee Sep 22, 2024

sdbds commented Sep 22, 2024

kohya-ss commented Sep 22, 2024

kohya-ss commented Sep 22, 2024

Maru-mee commented Sep 22, 2024 •

edited

Loading

Maru-mee commented Sep 24, 2024

kohya-ss commented Sep 25, 2024

Maru-mee commented Sep 25, 2024 •

edited

Loading

kohya-ss commented Sep 25, 2024

kohya-ss commented Sep 26, 2024

Maru-mee commented Sep 26, 2024 •

edited

Loading

Maru-mee commented Sep 26, 2024 •

edited

Loading

Maru-mee commented Sep 27, 2024 •

edited

Loading

kohya-ss commented Sep 27, 2024

Maru-mee commented Sep 27, 2024 •

edited

Loading

kohya-ss commented Sep 28, 2024

Maru-mee commented Sep 28, 2024

kohya-ss commented Sep 29, 2024

fix import and make npz for alpha_mask #1632

fix import and make npz for alpha_mask #1632

Conversation

Maru-mee commented Sep 22, 2024 • edited Loading

現行コードの問題点

発生条件

変更点

Maru-mee Sep 22, 2024

Choose a reason for hiding this comment

Maru-mee Sep 22, 2024

Choose a reason for hiding this comment

Maru-mee Sep 22, 2024

Choose a reason for hiding this comment

Maru-mee Sep 22, 2024

Choose a reason for hiding this comment

sdbds commented Sep 22, 2024

kohya-ss commented Sep 22, 2024

kohya-ss commented Sep 22, 2024

Maru-mee commented Sep 22, 2024 • edited Loading

Maru-mee commented Sep 24, 2024

kohya-ss commented Sep 25, 2024

Maru-mee commented Sep 25, 2024 • edited Loading

kohya-ss commented Sep 25, 2024

kohya-ss commented Sep 26, 2024

Maru-mee commented Sep 26, 2024 • edited Loading

Maru-mee commented Sep 26, 2024 • edited Loading

Maru-mee commented Sep 27, 2024 • edited Loading

kohya-ss commented Sep 27, 2024

Maru-mee commented Sep 27, 2024 • edited Loading

kohya-ss commented Sep 28, 2024

Maru-mee commented Sep 28, 2024

kohya-ss commented Sep 29, 2024

Maru-mee commented Sep 22, 2024 •

edited

Loading

Maru-mee commented Sep 22, 2024 •

edited

Loading

Maru-mee commented Sep 25, 2024 •

edited

Loading

Maru-mee commented Sep 26, 2024 •

edited

Loading

Maru-mee commented Sep 26, 2024 •

edited

Loading

Maru-mee commented Sep 27, 2024 •

edited

Loading

Maru-mee commented Sep 27, 2024 •

edited

Loading