Add bf16 support for VAE as a fallback #9295

Sakura-Luna · 2023-04-02T11:59:51Z

Describe what this pull request is trying to achieve.

According to the description here, bf16 can solve the problem of VAE working in half precision to generate black images, so I made this commit.

Additional notes and description of your changes

bf16 is great to use as a fallback, when the webui detects an empty image generation, it tries to convert and retry on supported devices, works fine on my test case. Note that if you want to use this feature, you need to use a GPU that supports bf16 and the webui works on PyTorch 2.1. For unsupported devices, you can still only use --no-half-vae.

Edit: In theory, AMD GPUs are also supported.

Environment this was tested in

OS: Win
Browser: chrome
Graphics card: NVIDIA Ampere GPU

playlogitech · 2023-04-02T16:15:58Z

the only vae that producing black squares is vae from nai/any/k8, just saying

catboxanon · 2023-04-03T00:53:59Z

The VAE should be converted to bf16 beforehand, I don't think the current implementation is the correct way to go about this because then it wastes time decoding the latent image twice.

Also this shouldn't be merged yet since only pytorch 2.1 nightly supports this iirc, and the only other PR for upgrading torch is for the 2.0 release. Should be marked as a draft for now.

Sakura-Luna · 2023-04-03T03:01:00Z

Also this shouldn't be merged yet since only pytorch 2.1 nightly supports this iirc, and the only other PR for upgrading torch is for the 2.0 release. Should be marked as a draft for now.

In theory, this is the case. According to some preferences, I did not add the version number to judge, but considering the actual use, it will not have any effect if it will not trigger NAN or disable NAN, so the merge is also feasible. If someone finds it necessary, an exception handling can also be added to explain that the PyTorch version does not support bf16.

The VAE should be converted to bf16 beforehand, I don't think the current implementation is the correct way to go about this because then it wastes time decoding the latent image twice.

I don't agree with this opinion, ideally this conversion is done only once, so it takes very little time. Conversely, bf16 performs worse than fp16 in terms of speed, VRAM consumption, and accuracy, so the overall bf16 is not ideal. Considering that even a problematic VAE does not necessarily generate a black image, lazy conversion is a good solution.
If you use pre-converted VAE, you need to add model type judgment, global bf16 cannot be accepted, it will affect the performance of other normal models.

Cyberbeing · 2023-04-03T16:45:38Z

Conversely, bf16 performs worse than fp16 in terms of speed, VRAM consumption, and accuracy, so the overall bf16 is not ideal.
If you use pre-converted VAE, you need to add model type judgment, global bf16 cannot be accepted, it will affect the performance of other normal models.

What kind of Performance and VRAM impact are you seeing to make this sort of statement? Do you have numbers to back this up, since personally I've not see this after using BF16 VAE full-time during the past 4 months, and performance improved after PyTorch merged into BF16 interpolate support into nightly.

I just did a quick test with --opt-sdp-no-mem-attention to double check myself, and full-time BF16 VAE was performing 1-2% Faster (or otherwise within the margin of error) than FP16 VAE with 100% identical VRAM usage on RTX A4000 using CUDNN 8.8.1 with TF32 enabled, and Live Previews disabled.

This is to be expected, since TF32<->BF16 should be a faster and higher quality conversion than TF32<->FP16 on NVIDIA. While both FP16 and BF16 have identical 16bit data sizes, so there should be no additional VRAM usage unless an unneeded conversion with duplication of data is occurring somewhere in WebUI.

Similarly as for accuracy, the general expectation is that BF16 VAE bias should have closer output to FP32 VAE bias than FP16 VAE bias does when converting from a FP32 VAE. VAE, unlike the rest of the model components, seems to care more about dynamic range than significant digit precision, though I suspect this aspect may need more widespread testing to verify.

The VAE should be converted to bf16 beforehand, I don't think the current implementation is the correct way to go about this because then it wastes time decoding the latent image twice.

Also this shouldn't be merged yet since only pytorch 2.1 nightly supports this iirc, and the only other PR for upgrading torch is for the 2.0 release. Should be marked as a draft for now.

I'm also of the opinion that it would make more sense to implement this similar to --no-half-vae as a command line argument once PyTorch 2.1 GA is released, since it serves a near-identical purpose while having performance and VRAM usage similar to FP16.

Sakura-Luna · 2023-04-03T17:05:12Z

What kind of Performance and VRAM impact are you seeing to make this sort of statement? Do you have numbers to back this up, since personally I've not see this after using BF16 VAE full-time during the past 4 months, and performance improved after PyTorch merged into BF16 interpolate support into nightly.

You can refer to the data at the end of this issue, but there may be some differences for VAE. I am most concerned about the accuracy, but as you said, it may be necessary to do a comparison to test which of bf16 and fp16 has better results.

I'm also of the opinion that it would make more sense to implement this similar to --no-half-vae as a command line argument once PyTorch 2.1 GA is released, since it serves a near-identical purpose while having performance and VRAM usage similar to FP16.

I do the bf16 conversion as a trigger operation, so it can be considered as part of the nan-check without additional parameters. If the implementation changes, then reconsideration is required.

Cyberbeing · 2023-04-03T18:56:25Z

You can refer to the data at the end of this issue

It seems like that old resolved issue was about a perf regression in Lightning only compared to PyTorch itself when using CUDNN V7 API, which was resolved with CUDNN V8 API. So I don't think that will affect us as Torch switched to using the CUDNN V8 API for BF16 convolution support a long time ago.

The statement at the bottom of the issue is also true currently, since BF16 mixed-precision can indeed having poor performance under certain workflows. This can cause situations where working in TF32 can be faster by eliminating slow casts which result in a negative perf benefit, but that doesn't seem to apply for webui SD inference.

I've tested BF16 autocast before in webui and it does indeed have a significant memory and performance impact on inference since Torch doesn't autocast many ops to BF16 like it does for FP16, resulting it mostly working in TF32 when BF16 autocasting is enabled. Still faster than TF32 only for our use-case, but very little benefit of doing so for inference unlike training.

Similarly converting other components of a model such a UNET or CLIP to BF16 has a rather large impact on seed output, so that should likely be avoided as well unless someone trained a BF16 precision model directly.

but there may be some differences for VAE.

Yes, but rather than specifically VAE, I believe it has more to do with not using BF16 mixed-precision (autocast) here. We are only doing a single manual cast to BF16 VAE bias, while using FP16 mixed-precision.

Sakura-Luna · 2023-04-04T15:21:31Z

@Cyberbeing briefly tested the VAE running under different types, and posted two pictures here to show that there is no obvious content difference in these test samples.

Tested on these samples, there is no significant difference in speed between tf16 and bf16. The following is the content difference between tf16 and bf16 compared with tf32 on the test sample, and bf16 shows more deviations on all samples.

In previous tests I found an example of global bf16 causing significant content differences (I didn't keep it), which is why I insist that bf16 is only suitable as a fallback, there is no advantage of bf16 on the current implementation.

Cyberbeing · 2023-04-04T20:23:46Z

It would appear your testing may have been done with the non-default webui options which I mentioned recently in discussions. Can you double check?

torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False
torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction = False

I did a quick test myself and I can only reproduce results similar to yours when those options are set, though it does seem to be true that BF16 always degrades output more than FP16 does, the difference is basically invisible to the human eye unless you pixel peep.

I think what happened, was months ago when I last tested this I was still using xformers (non-deterministic), and I had only set the following before I discovered the massive degradation recently if I used FP16 VAE without that also set to false.

torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = True
torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction = False

Which is why I got the impression that BF16 VAE was closer to FP32 VAE than FP16 VAE was, since with those settings it was. Yet my testing wasn't apples to apples.

This reminds me, that we really should create a pull request to disable the reduced_precision_reduction options, which is a huge quality boost for FP16 & BF16 and seems to bring seed reproduction very close to FP32 VAE. At least on my GPU setting both to False had no impact on inference performance, but someone should really test the impact on training.

FP32.png vs TF32.png (reduced_precision_reduction disabled)
Manhattan norm: 7015.666674117092 / per pixel: 0.008920881492763636
Zero norm: 20698.0 / per pixel: 0.026318868001302082

FP32.png vs FP16.png (reduced_precision_reduction disabled)
Manhattan norm: 29685.33371661324 / per pixel: 0.03774685378597672
Zero norm: 83786.0 / per pixel: 0.10653940836588542

FP32.png vs BF16.png (reduced_precision_reduction disabled)
Manhattan norm: 205934.0009592753 / per pixel: 0.26185862345285454
Zero norm: 403128.0 / per pixel: 0.512603759765625

With webui/pytorch defaults they had nearly identical seed output with image output noticeably different from FP32, though technically FP16 was still 0.03% better, that was invisible to the human eye on the noise floor:

FP32.png FP16-pytorch-defaults.png
Manhattan norm: 15672142.778314568 / per pixel: 19.92816006764039
Zero norm: 775492.0 / per pixel: 0.9860890706380209

FP32.png vs BF16-pytorch-defaults.png
Manhattan norm: 15677395.778366838 / per pixel: 19.934839602618965
Zero norm: 775501.0 / per pixel: 0.9861005147298177

BF16-pytorch-defaults.png vs FP16-pytorch-defaults.png
Manhattan norm: 191109.00093583157 / per pixel: 0.24300766110208075
Zero norm: 381207.0 / per pixel: 0.4847297668457031

I'm beginning to see the merit of your approach, but I'd still prefer this to be a command line argument which is optionally enabled. Even better would be implementing both methods and have --bfloat16-vae auto and --bfloat16-vae always to give user choice.

The main problems with the auto on nan method, is indeed the first time you'd be repeating processing, but also that the vae then stays permanently as bfloat16 until you load a new model or vae, which could lead to inconsistent seed output. In other words, your output could change depending on the order and type of generations you perform.

Sakura-Luna · 2023-04-05T00:34:11Z

It would appear your testing may have been done with the non-default webui options which I mentioned recently in discussions. Can you double check?
torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False
torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction = False

I didn't set it in the code. If the PyTorch documentation is not invalid, the default setting is fp16 on and bf16 off, and it is not recommended to enable allow bf16.

Sakura-Luna · 2023-04-05T02:12:22Z

@Cyberbeing This is the result of disabling allow fp16. Similar to the previous one, fp16 is still more similar, and there is no obvious performance difference between the two. These samples are significantly different from not disabling allow fp16.

This reminds me, that we really should create a pull request to disable the reduced_precision_reduction options, which is a huge quality boost for FP16 & BF16 and seems to bring seed reproduction very close to FP32 VAE. At least on my GPU setting both to False had no impact on inference performance, but someone should really test the impact on training.

I don't see any examples where disabling it improves the quality, fp16 can show a high similarity to fp32 whether or not allow fp16 is disabled. Predictably, disabling it slows down training.

I'm beginning to see the merit of your approach, but I'd still prefer this to be a command line argument which is optionally enabled. Even better would be implementing both methods and have --bfloat16-vae auto and --bfloat16-vae always to give user choice.

The main problems with the auto on nan method, is indeed the first time you'd be repeating processing, but also that the vae then stays permanently as bfloat16 until you load a new model or vae, which could lead to inconsistent seed output. In other words, your output could change depending on the order and type of generations you perform.

Adding an enable parameter is trivial, but I want it to work out of the box so that users don't have to suffer through VAE anymore on supported devices. The reason why it is linked to nan-check is because it is relatively useless (I don’t think it makes sense to prevent a black image from being saved, because you have already spent time, it just makes you press delete one less time), and on the other hand, it is because it does not increase the complexity of use.

Since I don't see any advantage of bf16, I won't support global bf16, it just reduces accuracy for nothing, maybe I can try another VAE. I think another suitable option is to add VAE type recognition to go with the pre-converted bf16 VAE, it can be reproduced stably, but it increases the cost of use. I know that this lazy conversion may cause inconsistent output on a specific VAE. This is a matter of trade-offs, but thanks to the similarity between fp32 and fp16, this inconsistency can be regarded as a difference with fp32.

Cyberbeing · 2023-04-05T02:51:09Z

[Edit: I realized a potential oversight that I may have forgotten to test FP32 VAE with reduced precision reduction enabled which was a missing data point, so I removed most of this post. It's irrelevant to the PR anyway, so not worth further discussion here.]

If the PyTorch documentation is not invalid, the default setting is fp16 on and bf16 off, and it is not recommended to enable allow bf16.

The PyTorch documentation is indeed invalid, which surprised me as well when I first discovered it months ago.

As you can see, the commit message states they've disabled it, but if you look at the code itself you'll see both fp16 & bf16 reduced_precision_reduction are enabled by default. Yet both degrade inference quality unless set to False, without any real performance benefit for SD inference. It does significantly change seeds though, so it would need to be made optional.

pytorch/pytorch@909a989
The PR made this change at the last moment, since the reviewer suggested the defaults remain as-is.
pytorch/pytorch@8b617f8
It was then merged into the main branch with both set to True in the code, but with an incorrect commit message.

Which is still the case in PyTorch 2.1 master branch

You can easily check the double-check pytorch defaults by just importing torch and calling them.

As you can see, both are True

Either way this is getting a bit off-topic, since setting those options to false should likely be made a separate PR, which will then likely need to be added to compatibility options so people can reproduce their old reduced precision seeds. I only brought this up, since it seems to make my results closer to your with them set to False.

I'll step out of this PR for now and just let automatic111 decide the course.

Sakura-Luna · 2023-04-05T03:26:54Z

@Cyberbeing I checked your example, it also clearly shows that fp16 maintains better accuracy under the same setting, as for the backend setting, it is not discussed here. You're right about one thing, we need opinion from the @AUTOMATIC1111.

YHD233 · 2023-04-07T17:32:10Z

I'm getting these errors, how do I fix them?

Sakura-Luna · 2023-04-07T17:42:02Z

@YHD233 What version of PyTorch and what type of GPU are you using?

YHD233 · 2023-04-07T17:50:37Z

@YHD233您使用的是什么版本的 PyTorch 和什么类型的 GPU？

python: 3.10.6 torch: 2.1.0.dev20230407+rocm5.4.2 GPU:RX6800 system: Kubuntu 22.04

But when I reopen the console the error doesn't appear again.

Sakura-Luna · 2023-04-07T17:51:02Z

@YHD233 I think I need to know your XYZ Plot parameters.

YHD233 · 2023-04-07T17:57:36Z

@YHD233 I think I need to know your XYZ Plot parameters.

X type:CFG Scale X values:8,9,10,11,12,13,14,15,16

When I have this error, this error will also occur when I close XYZ Plot and generate directly. It will not work until I close the console and open it again.

Sakura-Luna · 2023-04-07T18:00:14Z

@YHD233 My mistake, fixed.

YHD233 · 2023-04-08T09:43:44Z

I found that after stopping the generation when using XYZ plot, and then starting to generate, this error will be output when the progress bar is full

I will try to fix it.

I found that after stopping the generation when using XYZ plot, and then starting to generate, this error will be output when the progress bar is full

Sakura-Luna · 2023-05-08T17:15:05Z

@AUTOMATIC1111 What do you think about this PR?

AUTOMATIC1111 · 2023-05-09T08:09:10Z

Since we are on torch 2.0 and this appears to need torch 2.1, I have not considered it yet. I don't like adding a commandfline flag - if it works and is supported by GPU I think it should be enabled without asking user to enable it.

Also the most important question is does it really help with black square images in VAE?

Sakura-Luna · 2023-05-09T08:23:45Z

I don't like adding a commandfline flag - if it works and is supported by GPU I think it should be enabled without asking user to enable it.

I originally did it as part of the nan-check without adding parameters, but I found it was not feasible. We can't check AMD GPU support via PyTorch, so either introduce a new dependency, or add a startup parameter.

Also the most important question is does it really help with black square images in VAE?

On my test case, it is clearly effective, and in theory it can solve the same problem as no-half-vae, but consumes less VRAM.

Sakura-Luna · 2023-05-09T08:39:53Z

PyTorch lacks an explicit method to check for bf16 support on AMD GPUs.

AUTOMATIC1111 · 2023-05-09T12:18:44Z

Can't you just create a one-number bf16 tensor and do some with it like multiply iy by 0.5 to test if bk16 is supported?

Sakura-Luna · 2023-05-09T12:26:55Z

Can't you just create a one-number bf16 tensor and do some with it like multiply iy by 0.5 to test if bf16 is supported?

I know it works, it's just that it's not aesthetically pleasing and I don't have the equipment to test it.

Sakura-Luna · 2023-05-09T12:51:12Z

I found a method on PyTorch that will try to enable this feature by default.

catboxanon · 2023-05-09T14:43:04Z

The title for this PR isn't really clear either imo. It isn't adding support for bf16 VAEs, that's already supported out-of-the-box with Torch 2.1. All this adds is the rollback feature for when a fp16 VAE produces NaNs.

Sakura-Luna · 2023-05-09T14:50:13Z

The title for this PR isn't really clear either imo. It isn't adding support for bf16 VAEs, that's already supported out-of-the-box with Torch 2.1. All this adds is the rollback feature for when a fp16 VAE produces NaNs.

WebUI doesn't have code to handle bf16, so it looks like it will switch to fp even though PyTorch supports bf16. But it doesn't matter, the name of the pr has no effect.

This reverts commit d19d227

catboxanon · 2023-07-24T22:42:19Z

I think 23c947a supercedes this?

Sakura-Luna · 2023-07-25T01:57:14Z

I think 23c947a supercedes this?

You are wrong, both fp16 and bf16 are designed to save VRAM. If users can accept fp32 at any time, it is better to run VAE with fp32 globally, and it is even more useless to fall back to fp32.

lone-wolf-akela · 2023-10-29T12:55:23Z

Now that PyTorch 2.1 has been released, any news on this?

AUTOMATIC1111 · 2024-01-01T13:31:28Z

Every since this was made, the webui got a similar mechanism (and I used the idea from this PR) to deal with SDXL VAE errors, but converting to FP32 instead of BF16. So this PR would have to be integrated into existing system, which I did in ac0ecf3.

Sakura-Luna added 2 commits April 2, 2023 17:28

Add bf16 support.

4aa115d

Restore type

157c25f

Sakura-Luna requested a review from AUTOMATIC1111 as a code owner April 2, 2023 11:59

Add startup parameters and version check

d19d227

Bug fix

942c7d6

Sakura-Luna mentioned this pull request Apr 10, 2023

Support bf16 pkuliyi2015/multidiffusion-upscaler-for-automatic1111#119

Open

Sakura-Luna deleted the branch AUTOMATIC1111:master April 12, 2023 03:50

Sakura-Luna closed this Apr 12, 2023

Sakura-Luna deleted the master branch April 12, 2023 03:50

Sakura-Luna restored the master branch April 12, 2023 03:54

Sakura-Luna reopened this Apr 12, 2023

vladmandic mentioned this pull request Apr 19, 2023

Master vladmandic/automatic#242

Merged

Sakura-Luna mentioned this pull request Apr 24, 2023

Add bf16 support for VAE vladmandic/automatic#438

Merged

Exclude CPU devices

236fd98

Sakura-Luna linked an issue May 8, 2023 that may be closed by this pull request

[...] #10139

Closed

Sakura-Luna changed the title ~~Add bf16 support for VAE~~ Add bf16 support for VAE as a fallback May 9, 2023

Sakura-Luna added 4 commits May 9, 2023 23:11

Remove startup parameters

c5b03fa

Revert "Add startup parameters and version check"

7d4561b

This reverts commit d19d227

Logic adjustments

c48797e

Add description

6f16801

Sakura-Luna removed a link to an issue May 24, 2023

[...] #10139

Closed

AUTOMATIC1111 added a commit that referenced this pull request Jan 1, 2024

option to convert VAE to bfloat16 (implementation of #9295)

ac0ecf3

AUTOMATIC1111 closed this Jan 1, 2024

Sakura-Luna deleted the master branch January 3, 2024 14:11

w-e-w mentioned this pull request Feb 17, 2024

1.8.0-RC #14948

Closed

pawel665j mentioned this pull request Apr 16, 2024

## 1.8.0-RC #15537

Closed

ruchej pushed a commit to ruchej/stable-diffusion-webui that referenced this pull request Sep 30, 2024

option to convert VAE to bfloat16 (implementation of AUTOMATIC1111#9295)

51b3bdb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bf16 support for VAE as a fallback #9295

Add bf16 support for VAE as a fallback #9295

Sakura-Luna commented Apr 2, 2023 •

edited

Loading

playlogitech commented Apr 2, 2023

catboxanon commented Apr 3, 2023 •

edited

Loading

Sakura-Luna commented Apr 3, 2023 •

edited

Loading

Cyberbeing commented Apr 3, 2023 •

edited

Loading

Sakura-Luna commented Apr 3, 2023

Cyberbeing commented Apr 3, 2023 •

edited

Loading

Sakura-Luna commented Apr 4, 2023

Cyberbeing commented Apr 4, 2023 •

edited

Loading

Sakura-Luna commented Apr 5, 2023

Sakura-Luna commented Apr 5, 2023

Cyberbeing commented Apr 5, 2023 •

edited

Loading

Sakura-Luna commented Apr 5, 2023 •

edited

Loading

YHD233 commented Apr 7, 2023

Sakura-Luna commented Apr 7, 2023

YHD233 commented Apr 7, 2023 •

edited

Loading

Sakura-Luna commented Apr 7, 2023

YHD233 commented Apr 7, 2023

Sakura-Luna commented Apr 7, 2023 •

edited

Loading

YHD233 commented Apr 8, 2023

Sakura-Luna commented May 8, 2023

AUTOMATIC1111 commented May 9, 2023

Sakura-Luna commented May 9, 2023

Sakura-Luna commented May 9, 2023

AUTOMATIC1111 commented May 9, 2023

Sakura-Luna commented May 9, 2023

Sakura-Luna commented May 9, 2023

catboxanon commented May 9, 2023 •

edited

Loading

Sakura-Luna commented May 9, 2023

catboxanon commented Jul 24, 2023

Sakura-Luna commented Jul 25, 2023

lone-wolf-akela commented Oct 29, 2023

AUTOMATIC1111 commented Jan 1, 2024

Add bf16 support for VAE as a fallback #9295

Add bf16 support for VAE as a fallback #9295

Conversation

Sakura-Luna commented Apr 2, 2023 • edited Loading

playlogitech commented Apr 2, 2023

catboxanon commented Apr 3, 2023 • edited Loading

Sakura-Luna commented Apr 3, 2023 • edited Loading

Cyberbeing commented Apr 3, 2023 • edited Loading

Sakura-Luna commented Apr 3, 2023

Cyberbeing commented Apr 3, 2023 • edited Loading

Sakura-Luna commented Apr 4, 2023

Cyberbeing commented Apr 4, 2023 • edited Loading

Sakura-Luna commented Apr 5, 2023

Sakura-Luna commented Apr 5, 2023

Cyberbeing commented Apr 5, 2023 • edited Loading

Sakura-Luna commented Apr 5, 2023 • edited Loading

YHD233 commented Apr 7, 2023

Sakura-Luna commented Apr 7, 2023

YHD233 commented Apr 7, 2023 • edited Loading

Sakura-Luna commented Apr 7, 2023

YHD233 commented Apr 7, 2023

Sakura-Luna commented Apr 7, 2023 • edited Loading

YHD233 commented Apr 8, 2023

Sakura-Luna commented May 8, 2023

AUTOMATIC1111 commented May 9, 2023

Sakura-Luna commented May 9, 2023

Sakura-Luna commented May 9, 2023

AUTOMATIC1111 commented May 9, 2023

Sakura-Luna commented May 9, 2023

Sakura-Luna commented May 9, 2023

catboxanon commented May 9, 2023 • edited Loading

Sakura-Luna commented May 9, 2023

catboxanon commented Jul 24, 2023

Sakura-Luna commented Jul 25, 2023

lone-wolf-akela commented Oct 29, 2023

AUTOMATIC1111 commented Jan 1, 2024

Sakura-Luna commented Apr 2, 2023 •

edited

Loading

catboxanon commented Apr 3, 2023 •

edited

Loading

Sakura-Luna commented Apr 3, 2023 •

edited

Loading

Cyberbeing commented Apr 3, 2023 •

edited

Loading

Cyberbeing commented Apr 3, 2023 •

edited

Loading

Cyberbeing commented Apr 4, 2023 •

edited

Loading

Cyberbeing commented Apr 5, 2023 •

edited

Loading

Sakura-Luna commented Apr 5, 2023 •

edited

Loading

YHD233 commented Apr 7, 2023 •

edited

Loading

Sakura-Luna commented Apr 7, 2023 •

edited

Loading

catboxanon commented May 9, 2023 •

edited

Loading