Mixtral 8x7b models using more memory while loading #6652

RyenNelsen · 2024-04-13T07:41:57Z

There appears to be a regression between release versions b2586 and b2589. When attempting to load Mixtral 8x7b models with any version greater than b2586, the system utilizes an abnormal amount of memory compared to previous versions. Manually disabling mmap does resolve the issue.

Platform:
Windows 11 Pro
64GB RAM
Nvidia 3080

Versions I tested:
b2586 cuda cu12.2.0 & openblas

b2589 cuda cu12.2.0 & openblas

b2589 avx512

Diffing log output from b2586 cuda cu12.2.0 and b2589 cuda cu12.2.0 shows the following:
b2586: llm_load_tensors: CPU buffer size = 30735.50 MiB
b2589: llm_load_tensors: CUDA_Host buffer size = 30735.50 MiB

The text was updated successfully, but these errors were encountered:

compilade · 2024-04-13T12:20:27Z

This may be related to #6387. If I'm understanding this correctly, a solution would be to re-convert the model to GGUF from the original model files. (@slaren might want to clarify)

I didn't yet find a recent re-conversion of dolphin-mixtral-8x7b on HuggingFace but someone might do it eventually.
But there are already finetunes which use the new format, like https://huggingface.co/bartowski/Tess-2.0-Mixtral-v0.2-GGUF.

LostRuins · 2024-04-17T14:58:05Z

I can confirm this seems to happen for me too, I'm getting OOM at configs which worked fine previously.

phymbert · 2024-04-17T15:12:41Z

Yes that's why #6387 is a breaking changes. You need to convert to GGUF again to have merged experts tensors per layer or disable mmap.

It is clearly stated here:
#6387 (comment)

aleksusklim · 2024-04-17T17:19:45Z

@slaren, @phymbert, can you look please?
LostRuins#786

RyenNelsen added the bug-unconfirmed label Apr 13, 2024

RyenNelsen changed the title ~~Mixtral 8x7b models using greater memory while loading~~ Mixtral 8x7b models using more memory while loading Apr 13, 2024

RyenNelsen mentioned this issue Apr 13, 2024

Version 1.62 uses more memory than 1.61 for Mixtral 8x7b models LostRuins/koboldcpp#780

Open

phymbert closed this as not planned Won't fix, can't repro, duplicate, stale Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixtral 8x7b models using more memory while loading #6652

Mixtral 8x7b models using more memory while loading #6652

RyenNelsen commented Apr 13, 2024 •

edited

Loading

compilade commented Apr 13, 2024 •

edited

Loading

LostRuins commented Apr 17, 2024

phymbert commented Apr 17, 2024

aleksusklim commented Apr 17, 2024

Mixtral 8x7b models using more memory while loading #6652

Mixtral 8x7b models using more memory while loading #6652

Comments

RyenNelsen commented Apr 13, 2024 • edited Loading

compilade commented Apr 13, 2024 • edited Loading

LostRuins commented Apr 17, 2024

phymbert commented Apr 17, 2024

aleksusklim commented Apr 17, 2024

RyenNelsen commented Apr 13, 2024 •

edited

Loading

compilade commented Apr 13, 2024 •

edited

Loading