Skip to content

Commit

Permalink
[FP8 KV Cache, Mixtral] Avoid KeyError at loading pre-quantized FP8 m… (
Browse files Browse the repository at this point in the history
  • Loading branch information
HaiShaw authored Oct 29, 2024
1 parent d04899d commit 54dd3ea
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions python/sglang/srt/models/mixtral.py
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,9 @@ def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
# Skip loading extra bias for GPTQ models.
if name.endswith(".bias") and name not in params_dict:
continue
# Skip loading kv_scale from ckpts towards new design.
if name.endswith(".kv_scale") and name not in params_dict:
continue
if name is None:
continue

Expand Down

0 comments on commit 54dd3ea

Please sign in to comment.