smoothquant on starcoder2 #1886

tonylek · 2024-07-03T19:09:37Z

Hi,

I'm having issue when trying to convert starcoder2-3b with smoothquant to trtllm.
I'm running on a100-40gi.

This is my commad:
python tensorrt_llm/examples/gpt/convert_checkpoint.py --model_dir /model/starcoder2-3b --output_dir salmon_output --tp_size 1 --smoothquant 0.5

This is the error I'm recieving:

Generating validation split: 100%|███████████████████████████████████| 4869/4869 [00:00<00:00, 572495.69 examples/s]
calibrating model: 100%|██████████████████████████████████████████████████████████| 512/512 [00:44<00:00, 11.49it/s]
Traceback (most recent call last):
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 2022, in <module>
    convert_and_save(rank)
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 1984, in convert_and_save
    weights = convert_hf_gpt_legacy(
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 1049, in convert_hf_gpt_legacy
    qkv_out_dim = qkv_w.shape[0]
AttributeError: 'NoneType' object has no attribute 'shape'

The text was updated successfully, but these errors were encountered:

QiJune · 2024-07-04T05:12:56Z

@Tracin Could you please take a look? Thanks

Tracin · 2024-07-05T06:02:56Z

@tonylek
For Starcoder2 model, please use ModelOpt to do calibration.

python3 example/quantization/quantize.py --model_dir starcoder2 \
        --dtype float16 \
        --qformat int8_sq \
        --output_dir starcoder2/trt_ckpt/sq/1-gpu

trtllm-build --checkpoint_dir starcoder2/trt_ckpt/sq/1-gpu \
        --output_dir starcoder2/trt_engines/sq/1-gpu --builder_opt=4

I will update this usage in the doc.

tonylek · 2024-07-08T14:18:27Z

Hi, thanks, I'm still getting this error:

[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.float32.
Initializing tokenizer from /model/starcoder2-3b
No quantization applied, export float16 model
Unknown model type Starcoder2ForCausalLM. Continue exporting...
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
current rank: 0, tp rank: 0, pp rank: 0
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to salmon_output/modelopt_model.0.pth using torch.save for further inspection.
Detailed export error: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 364, in export_tensorrt_llm_checkpoint
    for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 312, in torch_to_tensorrt_llm_checkpoint
    tensorrt_llm_config = convert_to_tensorrt_llm_config(model_config, tp_size_overwrite)
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/tensorrt_llm_utils.py", line 84, in convert_to_tensorrt_llm_config
    "architecture": MODEL_NAME_TO_HF_ARCH_MAP[decoder_type],
KeyError: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
  File "/workspace/tensorrt_llm/examples/quantization/quantize.py", line 90, in <module>
    quantize_and_export(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 340, in quantize_and_export
    with open(f"{export_path}/config.json", "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'starcoder2_output/config.json'

when I run:

python3 tensorrt_llm/examples/quantization/quantize.py --model_dir /model/starcoder2-3b         --dtype float16         --qformat int8_sq         --output_dir starcoder2_output

Tracin · 2024-07-09T02:43:27Z

@tonylek Can you try to upgrade Modelopt?

QiJune added the bug Something isn't working label Jul 4, 2024

QiJune assigned Tracin Jul 4, 2024

kaiyux mentioned this issue Jul 9, 2024

Update TensorRT-LLM #1918

Merged

QiJune added the functionality issue label Aug 5, 2024

Shixiaowei02 mentioned this issue Aug 29, 2024

TensorRT-LLM v0.12 Update #2164

Merged

hello-11 closed this as completed Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smoothquant on starcoder2 #1886

smoothquant on starcoder2 #1886

tonylek commented Jul 3, 2024

QiJune commented Jul 4, 2024

Tracin commented Jul 5, 2024 •

edited

Loading

tonylek commented Jul 8, 2024

Tracin commented Jul 9, 2024

smoothquant on starcoder2 #1886

smoothquant on starcoder2 #1886

Comments

tonylek commented Jul 3, 2024

QiJune commented Jul 4, 2024

Tracin commented Jul 5, 2024 • edited Loading

tonylek commented Jul 8, 2024

Tracin commented Jul 9, 2024

Tracin commented Jul 5, 2024 •

edited

Loading