Support configuring precision and quantization in HuggingFaceClient #1912

yifanmai · 2023-10-17T22:35:27Z

Support passing the torch_dtype, load_in_8bit and load_in_4bit parameters to HuggingFaceClient from the ModelDeployment configuration.

Example usage:

prod_env/model_deployments.yaml:

model_deployments:
  # Example model with precision set to bfloat16
  - name: yifanmai/gpt2-bfloat16
    tokenizer_name: "huggingface/gpt2"
    max_sequence_length: 1024
    window_service_spec:
      class_name: "helm.benchmark.window_services.huggingface_window_service.HuggingFaceWindowService"
      args:
        pretrained_model_name_or_path: gpt2
    client_spec:
      class_name: "helm.proxy.clients.huggingface_client.HuggingFaceClient"
      args:
        pretrained_model_name_or_path: gpt2
        torch_dtype: torch.bfloat16
  # Example model with 8-bit quantization
  - name: yifanmai/gpt2-8bit
    tokenizer_name: "huggingface/gpt2"
    max_sequence_length: 1024
    window_service_spec:
      class_name: "helm.benchmark.window_services.huggingface_window_service.HuggingFaceWindowService"
      args:
        pretrained_model_name_or_path: gpt2
    client_spec:
      class_name: "helm.proxy.clients.huggingface_client.HuggingFaceClient"
      args:
        pretrained_model_name_or_path: gpt2
        load_in_8bit: true

Also clean up revision - treat it as as another kwarg rather than treating it specially.

yifanmai · 2023-10-17T23:10:22Z

Also, eventually we might require trust_remote_code=True to be set explicitly in the same way.

JosselinSomervilleRoberts

LGTM

JosselinSomervilleRoberts · 2023-10-18T20:27:58Z

src/helm/proxy/clients/huggingface_client.py

@@ -56,23 +56,20 @@ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwa
 class HuggingFaceServer:
    """A thin wrapper around a Hugging Face AutoModelForCausalLM for HuggingFaceClient to call."""

-    def __init__(self, pretrained_model_name_or_path: str, revision: Optional[str] = None):
+    def __init__(self, pretrained_model_name_or_path: str, **kwargs):


Should we add a comment here describing common kwargs that should be specified such as revision, precision, ... ?

I would prefer to do this in the docs instead.

JosselinSomervilleRoberts · 2023-11-18T00:06:46Z

I think this is now handled by #1903 and we can close this

yifanmai requested review from percyliang and JosselinSomervilleRoberts October 17, 2023 22:35

yifanmai mentioned this pull request Oct 17, 2023

Add the Tokenizer object logic #1874

Merged

JosselinSomervilleRoberts approved these changes Oct 18, 2023

View reviewed changes

yifanmai force-pushed the yifanmai/fix-huggingface-precision branch from ee803fa to 21898b8 Compare November 2, 2023 16:54

yifanmai force-pushed the yifanmai/fix-huggingface-precision branch from 21898b8 to c882746 Compare December 2, 2023 01:20

yifanmai added 3 commits December 11, 2023 14:47

Support configuring precision and quantization in HuggingFaceClient

47d1a62

Remove

be346a4

More fixes

982cdf9

yifanmai force-pushed the yifanmai/fix-huggingface-precision branch from be458e3 to 982cdf9 Compare December 11, 2023 22:56

yifanmai merged commit c9b4a7e into main Dec 11, 2023
6 checks passed

yifanmai deleted the yifanmai/fix-huggingface-precision branch December 11, 2023 23:08

This was referenced Feb 2, 2024

GLM 4/8 bit (model and window service) #1192

Closed

Allow Hugging Face models to be loaded with bf16 #1758

Closed

yifanmai mentioned this pull request Aug 6, 2024

What is a good local workflow #1794

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support configuring precision and quantization in HuggingFaceClient #1912

Support configuring precision and quantization in HuggingFaceClient #1912

yifanmai commented Oct 17, 2023

yifanmai commented Oct 17, 2023

JosselinSomervilleRoberts left a comment

JosselinSomervilleRoberts Oct 18, 2023

yifanmai Dec 2, 2023

JosselinSomervilleRoberts commented Nov 18, 2023

Support configuring precision and quantization in HuggingFaceClient #1912

Support configuring precision and quantization in HuggingFaceClient #1912

Conversation

yifanmai commented Oct 17, 2023

yifanmai commented Oct 17, 2023

JosselinSomervilleRoberts left a comment

Choose a reason for hiding this comment

JosselinSomervilleRoberts Oct 18, 2023

Choose a reason for hiding this comment

yifanmai Dec 2, 2023

Choose a reason for hiding this comment

JosselinSomervilleRoberts commented Nov 18, 2023