How to run with low vram/ram? #4

Ariiio · 2024-10-23T11:38:43Z

I want to run this model locally so I can caption a lot of images, my problem however is that I get the following error message every time:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Of the allocated memory 14.51 GiB is allocated by PyTorch, and 117.23 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I've tried to do set 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:10' and some other lower values and nothing worked.

I have 8gb VRAM and 16gb RAM

The text was updated successfully, but these errors were encountered:

fpgaminer · 2024-11-29T22:05:15Z

You can try setting "device_map" to "auto" and possibly using HuggingFace accelerate to offload the model to RAM. Quantization could also help fit the entire model in VRAM, but is currently isn't working (see issue #3 ). Hopefully I'll have that fixed soon.

John6666cat · 2024-12-09T16:39:28Z

Alpha Two has errors in bitsandbytes, but a quantized repo in GPTQ format was released today. I was wondering if this could be used to load with less VRAM. Also, I think GPTQ is compatible with CPU. It is an easy to use format, although the disadvantage is that it does not allow on-the-fly quantization.
https://huggingface.co/OPEA/llama-joycaption-alpha-two-hf-llava-int4-sym-inc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run with low vram/ram? #4

How to run with low vram/ram? #4

Ariiio commented Oct 23, 2024

fpgaminer commented Nov 29, 2024

John6666cat commented Dec 9, 2024

How to run with low vram/ram? #4

How to run with low vram/ram? #4

Comments

Ariiio commented Oct 23, 2024

fpgaminer commented Nov 29, 2024

John6666cat commented Dec 9, 2024