Sharding large quantized model to push to Huggingface? #53

shripadk · 2024-04-15T23:02:16Z

shripadk
Apr 15, 2024

I quantized Mixtral 8x22b. It produces a pt model file of approx 60GB in size. I can't push the model to huggingface because of per-file size limits (50GB max). Is there a way to shard it like we can with safetensors?

mobicham · 2024-04-16T07:33:09Z

mobicham
Apr 16, 2024
Maintainer

Hi, yeah this is something important that we wanted to add but got lost in the backlog.

Technically, it's very simple to do, you need to override these functions . For the moment, you can do it via monkey patching like this:

##########################################################################
#Do this on the program start

from hqq.models.base import BaseHQQModel
import os

# Get available number of chunks
def get_num_weight_chunks(cls, save_dir: str) -> int:
	name, ext = cls.get_weight_file(save_dir).split('.')
	files = os.listdir(save_dir)
	num_chunks = 0
	for file in files:
		c_file = file.split('.')
		if(len(c_file)!=2): 
			continue
		if(c_file[0] == name and c_file[1] == ext):
			num_chunks +=1 
	return num_chunks

# Get name of the chunk file
def get_weight_file_chunk(cls, save_dir: str, chunk_id: int) -> str:
	name, ext = cls.get_weight_file(save_dir).split('.')
	return name + '_' + str(chunk_id) + '.' + ext 

def split_weights_into_chunks(cls, weights, num_chunks="auto") -> list:
	# TODO : logic to split weights
	return weights_chunks

# Save weights to disk
def save_weights_chunked(cls, weights: dict, save_dir: str) -> None:
	#weights is just a dictionary, splits the weights into weights_chunks: list.
	weights_chunks = cls.split_weights_into_chunks(weights)
	num_chunks     = len(weights_chunks)
	for i in range(num_chunks):
    	torch.save(weights_chunks[i], cls.get_weight_file_chunk(save_dir, i))

# Load weights from disk
def load_weights_chunked(cls, save_dir: str, map_location: bool = None):
	weights = {}
	num_chunks = cls.get_num_weight_chunks(save_dir)
	for i in range(num_chunks):
		weights.update(torch.load(cls.get_weight_file_chunk(save_dir, i), map_location=map_location))
    return weights

BaseHQQModel.get_weight_file_chunk     = get_weight_file_chunk
BaseHQQModel.split_weights_into_chunks = split_weights_into_chunks
BaseHQQModel.get_weights_file_chunked  = get_weights_file_chunked
BaseHQQModel.save_weights              = save_weights_chunked
BaseHQQModel.load_weights              = load_weights_chunked
##########################################################################

Later, we will do a refactoring to fully support safetensors as well.

Let me know if the solution above works!

3 replies

shripadk Apr 16, 2024
Author

Thanks for the reply!

Later, we will do a refactoring to fully support safetensors as well.

That would be perfect!

I just went through the code you provided. Understood the structure for the most part. Couple of things that threw me off:

split_weights_into_chunks:

def split_weights_into_chunks(cls, weights, num_chunks="auto") -> list:
	# TODO : logic to split weights
	return weights_chunks

What would the logic look like? Also weights_chunks is not defined here. Are you intending to return the weights as is? If so, how does the chunking happen?

Wouldn't load_weights_chunked always have num_chunks = 1? Because it will generate one qmodel.pt file while saving so num_chunks will also be 1 if get_num_weight_chunks is run?

Sorry I haven't actually tested the code so I might be missing something. I was just thinking of the code logic first before I integrate it with my code.

mobicham Apr 16, 2024
Maintainer

I still need to think about the split weights logic, ideally the user can define chunk size in GB. For now you can try something very simple where you manually set the num_chunks as an integer, that would be something like:

def split_weights_into_chunks(cls, weights, num_chunks=2: str | int) -> list:
	assert num_chunks>1, "invalid num_chunks param"

	keys           = list(weights.keys())
	chunk_size     = int(round(len(keys) / num_chunks + 0.1))
	keys_chunked   = [keys[i:i + chunk_size] for i in range(0, len(keys), chunk_size)]
	weights_chunks = []
	for i in range(num_chunks):
		weights_chunks.append({key: weights[key] for key in keys_chunked[i]})

	return weights_chunks

No, it saves the chunks as qmodel_0.pt, qmodel_1.pt, etc. get_num_weight_chunks() iterates through the directory and counts files that have the format qmodel_i.pt. I haven't tested the code but it looks mostly fine.

shripadk Apr 17, 2024
Author

I tested the changes. Get the following error. Not sure where I am going wrong because save_dir is clearly defined in load_weights_chunked. The method signature looks right to me:

Failed to load the weights
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/hqq/models/base.py", line 323, in from_quantized
    weights = cls.load_weights(save_dir)
TypeError: deploy_model.<locals>.load_weights_chunked() missing 1 required positional argument: 'save_dir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pkg/modal/_container_io_manager.py", line 440, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 128, in run_input
    res = imp_fun.fun(*args, **kwargs)
  File "/root/service.py", line 499, in deploy_model
    model = HQQModelForCausalLM.from_quantized(
  File "/opt/conda/lib/python3.10/site-packages/hqq/engine/base.py", line 85, in from_quantized
    model = cls._get_hqq_class(arch_key).from_quantized(
  File "/opt/conda/lib/python3.10/site-packages/hqq/models/base.py", line 326, in from_quantized
    raise FileNotFoundError
FileNotFoundError

The monkeypatched class:

##########################################################################

import torch
from hqq.engine.base import BaseHQQModel

# Get available number of chunks
def get_num_weight_chunks(self, save_dir: str) -> int:
    name, ext = self.get_weight_file(save_dir).split(".")
    files = os.listdir(save_dir)
    num_chunks = 0
    for file in files:
        c_file = file.split(".")
        if len(c_file) != 2:
            continue
        if c_file[0] == name and c_file[1] == ext:
            num_chunks += 1
    return num_chunks

# Get name of the chunk file
def get_weight_file_chunk(self, save_dir: str, chunk_id: int) -> str:
    name, ext = self.get_weight_file(save_dir).split(".")
    return name + "_" + str(chunk_id) + "." + ext

def split_weights_into_chunks(self, weights, num_chunks=5) -> list:
    assert num_chunks > 1, "invalid num_chunks param"

    keys = list(weights.keys())
    chunk_size = int(round(len(keys) / num_chunks + 0.1))
    keys_chunked = [
        keys[i : i + chunk_size] for i in range(0, len(keys), chunk_size)
    ]
    weights_chunks = []
    for i in range(num_chunks):
        weights_chunks.append({key: weights[key] for key in keys_chunked[i]})

    return weights_chunks

# Save weights to disk
def save_weights_chunked(self, weights: dict, save_dir: str) -> None:
    # weights is just a dictionary, splits the weights into weights_chunks: list.
    weights_chunks = self.split_weights_into_chunks(weights)
    num_chunks = len(weights_chunks)
    for i in range(num_chunks):
        torch.save(weights_chunks[i], self.get_weight_file_chunk(save_dir, i))

# Load weights from disk
def load_weights_chunked(self, save_dir: str, map_location: bool | None = None):
    print("received save_dir:", save_dir)

    weights = {}
    num_chunks = self.get_num_weight_chunks(save_dir)
    for i in range(num_chunks):
        weights.update(torch.load(get_weight_file_chunk(save_dir, i), map_location=map_location)) # type: ignore 
    return weights

BaseHQQModel.get_weight_file_chunk = get_weight_file_chunk # type: ignore
BaseHQQModel.split_weights_into_chunks = split_weights_into_chunks # type: ignore
BaseHQQModel.get_num_weight_chunks = get_num_weight_chunks # type: ignore
BaseHQQModel.save_weights = save_weights_chunked # type: ignore
BaseHQQModel.load_weights = load_weights_chunked # type: ignore

##########################################################################

mobicham · 2024-09-12T07:10:51Z

mobicham
Sep 12, 2024
Maintainer

Serialization will be fully supported directly in HF after this PR: huggingface/transformers#33141

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharding large quantized model to push to Huggingface? #53

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Sharding large quantized model to push to Huggingface? #53

shripadk Apr 15, 2024

Replies: 2 comments · 3 replies

mobicham Apr 16, 2024 Maintainer

shripadk Apr 16, 2024 Author

mobicham Apr 16, 2024 Maintainer

shripadk Apr 17, 2024 Author

mobicham Sep 12, 2024 Maintainer

shripadk
Apr 15, 2024

Replies: 2 comments 3 replies

mobicham
Apr 16, 2024
Maintainer

shripadk Apr 16, 2024
Author

mobicham Apr 16, 2024
Maintainer

shripadk Apr 17, 2024
Author

mobicham
Sep 12, 2024
Maintainer