Skip to content

Commit

Permalink
[Misc] add process_weights_after_loading for DummyLoader (vllm-projec…
Browse files Browse the repository at this point in the history
…t#8969)

Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
  • Loading branch information
divakar-amd authored and sumitd2 committed Nov 14, 2024
1 parent fb63827 commit cfce256
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions vllm/model_executor/model_loader/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,18 @@ def load_model(self, *, model_config: ModelConfig,
# NOTE(woosuk): For accurate performance evaluation, we assign
# random values to the weights.
initialize_dummy_weights(model)

for _, module in model.named_modules():
quant_method = getattr(module, "quant_method", None)
if quant_method is not None:
# When quant methods need to process weights after loading
# (for repacking, quantizing, etc), they expect parameters
# to be on the global target device. This scope is for the
# case where cpu offloading is used, where we will move the
# parameters onto device for processing and back off after.
with device_loading_context(
module, torch.device(device_config.device)):
quant_method.process_weights_after_loading(module)
return model.eval()


Expand Down

0 comments on commit cfce256

Please sign in to comment.