Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smoothquant on starcoder2 #1886

Closed
tonylek opened this issue Jul 3, 2024 · 4 comments
Closed

smoothquant on starcoder2 #1886

tonylek opened this issue Jul 3, 2024 · 4 comments
Assignees
Labels
bug Something isn't working functionality issue

Comments

@tonylek
Copy link
Contributor

tonylek commented Jul 3, 2024

Hi,

I'm having issue when trying to convert starcoder2-3b with smoothquant to trtllm.
I'm running on a100-40gi.

This is my commad:
python tensorrt_llm/examples/gpt/convert_checkpoint.py --model_dir /model/starcoder2-3b --output_dir salmon_output --tp_size 1 --smoothquant 0.5

This is the error I'm recieving:

Generating validation split: 100%|███████████████████████████████████| 4869/4869 [00:00<00:00, 572495.69 examples/s]
calibrating model: 100%|██████████████████████████████████████████████████████████| 512/512 [00:44<00:00, 11.49it/s]
Traceback (most recent call last):
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 2022, in <module>
    convert_and_save(rank)
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 1984, in convert_and_save
    weights = convert_hf_gpt_legacy(
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 1049, in convert_hf_gpt_legacy
    qkv_out_dim = qkv_w.shape[0]
AttributeError: 'NoneType' object has no attribute 'shape'
@QiJune QiJune added the bug Something isn't working label Jul 4, 2024
@QiJune
Copy link
Collaborator

QiJune commented Jul 4, 2024

@Tracin Could you please take a look? Thanks

@Tracin
Copy link
Collaborator

Tracin commented Jul 5, 2024

@tonylek
For Starcoder2 model, please use ModelOpt to do calibration.

python3 example/quantization/quantize.py --model_dir starcoder2 \
        --dtype float16 \
        --qformat int8_sq \
        --output_dir starcoder2/trt_ckpt/sq/1-gpu

trtllm-build --checkpoint_dir starcoder2/trt_ckpt/sq/1-gpu \
        --output_dir starcoder2/trt_engines/sq/1-gpu --builder_opt=4

I will update this usage in the doc.

@tonylek
Copy link
Contributor Author

tonylek commented Jul 8, 2024

Hi, thanks, I'm still getting this error:

[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.float32.
Initializing tokenizer from /model/starcoder2-3b
No quantization applied, export float16 model
Unknown model type Starcoder2ForCausalLM. Continue exporting...
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
current rank: 0, tp rank: 0, pp rank: 0
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to salmon_output/modelopt_model.0.pth using torch.save for further inspection.
Detailed export error: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 364, in export_tensorrt_llm_checkpoint
    for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 312, in torch_to_tensorrt_llm_checkpoint
    tensorrt_llm_config = convert_to_tensorrt_llm_config(model_config, tp_size_overwrite)
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/tensorrt_llm_utils.py", line 84, in convert_to_tensorrt_llm_config
    "architecture": MODEL_NAME_TO_HF_ARCH_MAP[decoder_type],
KeyError: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
  File "/workspace/tensorrt_llm/examples/quantization/quantize.py", line 90, in <module>
    quantize_and_export(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 340, in quantize_and_export
    with open(f"{export_path}/config.json", "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'starcoder2_output/config.json'

when I run:

python3 tensorrt_llm/examples/quantization/quantize.py --model_dir /model/starcoder2-3b         --dtype float16         --qformat int8_sq         --output_dir starcoder2_output

@Tracin
Copy link
Collaborator

Tracin commented Jul 9, 2024

@tonylek Can you try to upgrade Modelopt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working functionality issue
Projects
None yet
Development

No branches or pull requests

4 participants