Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use less RAM when creating models #11958

Merged
merged 1 commit into from
Jul 29, 2023
Merged

Use less RAM when creating models #11958

merged 1 commit into from
Jul 29, 2023

Conversation

AUTOMATIC1111
Copy link
Owner

Description

  • makes the webui use less system RAM (not VRAM) when creating models
  • works by allocating all parameters on the torch's meta device when creating a model, and deleting them from the state_dict one by one as the parameters are loaded into model by load_state_dict
  • cmd arg --disable-model-loading-ram-optimization disables it

Checklist:

Comment on lines +9 to +27
class ReplaceHelper:
def __init__(self):
self.replaced = []

def replace(self, obj, field, func):
original = getattr(obj, field, None)
if original is None:
return None

self.replaced.append((obj, field, original))
setattr(obj, field, func)

return original

def restore(self):
for obj, field, original in self.replaced:
setattr(obj, field, original)

self.replaced.clear()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be implemented using contextlib.ExitStack too.

@AUTOMATIC1111
Copy link
Owner Author

If someone who's low on system RAM can try this and report back it would help me.

@dhwz dhwz mentioned this pull request Jul 27, 2023
1 task
@rytt0001
Copy link

Tried it out with 24Gb of ram and i can finally load SDXL without the need of a swap file in windows. (before I had OOM errors on the cpu side.. loading was taking 15-20 GB of ram) so it seems to work better.

@Alexamakans
Copy link

Merged this into dev locally and it fixed my issue that was the same as #12081

set COMMANDLINE_ARGS=--medvram --opt-channelslast --upcast-sampling --opt-sdp-attention

Python version: 3.11.4
OS: Windows 11 Home (10.0.22621 Build 22621)
RAM: 32 GB (Total virtual = 65.6 GB)
VRAM: 12 GB
GPU: NVIDIA GeForce RTX 3060

VAE: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
Model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0

@jkrauss82
Copy link

jkrauss82 commented Jul 28, 2023

Came here due to having only 16G RAM and wanted to try out SDXL.

Ubuntu 20.04.6 LTS
Using TCMalloc: libtcmalloc_minimal.so.4
Python 3.10.11 (main, Apr 5 2023, 14:15:10) [GCC 9.4.0]
Version: v1.5.0-RC-43-g0a89cd1a
RAM: 16 GB (Total virtual = 32 GB)
VRAM: 12 GB
GPU: NVIDIA GeForce RTX 3060
Arguments: --opt-channelslast --opt-sdp-no-mem-attention --no-half-vae

I was able to load SDXL on 1.5.1 as well but in my observation RAM usage is slightly better during and after the loading of the models. On 1.5.1 more RAM stayed occupied after the model was loaded.

Total duration is random and depends on the amount of swapped out pages. Will run some more tests in a headless environment later, which should yield an additional 1.5G of free RAM on my system.

EDIT: Made some tests in headless mode (meaning I have stopped the display-manager service of Ubuntu and access the webui from another machine). Seems that the additional RAM available in this mode is just enough to get the model loading quickly (🥳):

First load: Model loaded in 28.0s (create model: 0.5s, apply weights to model: 22.6s, apply channels_last: 0.4s, apply half(): 1.4s, load VAE: 1.2s, move model to device: 1.4s, calculate empty prompt: 0.4s)

Second load: Model loaded in 35.4s (create model: 0.5s, apply weights to model: 29.1s, apply channels_last: 0.4s, apply half(): 2.0s, load VAE: 1.2s, move model to device: 1.4s, calculate empty prompt: 0.5s).

I also observe improved loading time for SD 1.5 models, especially right after starting the webui.

Great PR! Will stay on this branch until it has been merged.

@AUTOMATIC1111 AUTOMATIC1111 merged commit ac81c1d into dev Jul 29, 2023
6 checks passed
@jmkgreen
Copy link

I have just tried this on an EC2 with 16GB RAM and no swapfile (my usual environment). It crashes on switching to the SDXL 1.0 base model. On restart the model shows as selected.

On then selecting the usual default model, it crashes with what appears to be a high memory use.

Uncertain if more improvements are to come, or the expectation of more than 16GB RAM now needs to be made crystal clear.

@AUTOMATIC1111 AUTOMATIC1111 deleted the conserve-ram branch July 31, 2023 17:48
@w-e-w w-e-w mentioned this pull request Aug 24, 2023
@JiggishPlays
Copy link

this def does not use less ram, takes me around 4 min to generate 1 image on 25 steps, using 14 gig ram when doing so and then once generation stops it still shows that it is using 14 gig. never had issues with any other versions. how can I fix this on my side?

Netherquark pushed a commit to Netherquark/stable-diffusion-webui that referenced this pull request Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants