[Bug]: vllm0.5.5 Ignores VLLM_USE_MODELSCOPE=True and Accesses huggingface.co #7986

NaiveYan · 2024-08-29T07:35:59Z

Your current environment

The output of `python collect_env.py`

PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.10.14 (main, Apr  6 2024, 18:45:05) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-4.15.0-39-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA TITAN V
GPU 1: NVIDIA TITAN V
GPU 2: NVIDIA TITAN V
GPU 3: NVIDIA TITAN V

Nvidia driver version: 560.35.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          56
On-line CPU(s) list:             0-55
Thread(s) per core:              2
Core(s) per socket:              14
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Stepping:                        1
Frequency boost:                 enabled
CPU MHz:                         1472.911
CPU max MHz:                     2401.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4799.79
Virtualization:                  VT-x
L1d cache:                       896 KiB
L1i cache:                       896 KiB
L2 cache:                        7 MiB
L3 cache:                        70 MiB
NUMA node0 CPU(s):               0-13,28-41
NUMA node1 CPU(s):               14-27,42-55
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB, IBRS_FW
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts flush_l1d

Versions of relevant libraries:
[pip3] flashinfer==0.1.4+cu121torch2.4
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.20
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==26.2.0
[pip3] torch==2.4.0
[pip3] torchvision==0.19.0
[pip3] transformers==4.44.2
[pip3] triton==3.0.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.5@
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0	GPU1	GPU2	GPU3	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	PHB	PHB	PHB	0-13,28-41	0		N/A
GPU1	PHB	 X 	PIX	PIX	0-13,28-41	0		N/A
GPU2	PHB	PIX	 X 	PIX	0-13,28-41	0		N/A
GPU3	PHB	PIX	PIX	 X 	0-13,28-41	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

After running

docker run --runtime nvidia --gpus all -v cache/modelscope:/root/.cache/modelscope --env "VLLM_USE_MODELSCOPE=True" -p 8000:8000 --ipc host -d --name vllm vllm/vllm-openai:v0.5.5 --model LLM-Research/Meta-Llama-3.1-8B-Instruct --trust-remote-code -tp 4

the container exits quickly. Upon checking the logs, it was found that

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/LLM-Research/Meta-Llama-3.1-8B-Instruct/resolve/main/preprocessor_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 402, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1347, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1854, in _raise_on_head_call_error
    raise head_call_error
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1751, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1673, in get_hf_file_metadata
    r = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 376, in _request_wrapper
    response = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 400, in _request_wrapper
    hf_raise_for_status(response)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 352, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-66d02289-540ea52d0c39b1ec55f41068;868250ab-8da1-434d-a9fc-27aafbd9c261)

Repository Not Found for url: https://huggingface.co/LLM-Research/Meta-Llama-3.1-8B-Instruct/resolve/main/preprocessor_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 476, in <module>
    asyncio.run(run_server(args))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 443, in run_server
    async with build_async_engine_client(args) as async_engine_client:
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 117, in build_async_engine_client
    if (model_is_embedding(args.model, args.trust_remote_code,
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 71, in model_is_embedding
    return ModelConfig(model=model_name,
  File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 172, in __init__
    self.hf_image_processor_config = get_hf_image_processor_config(
  File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 113, in get_hf_image_processor_config
    return get_image_processor_config(model, revision=revision, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/image_processing_auto.py", line 274, in get_image_processor_config
    resolved_config_file = get_file_from_repo(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 554, in get_file_from_repo
    return cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 425, in cached_file
    raise EnvironmentError(
OSError: LLM-Research/Meta-Llama-3.1-8B-Instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

The possible reason is that in version 0.5.5,
https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/config.py#L113

    return get_image_processor_config(model, revision=revision, **kwargs)

always uses transformers.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

simon-mo · 2024-08-29T16:57:36Z

Fixes welcomed!

NickLucche · 2024-08-30T14:25:31Z

I can look into this

NaiveYan added the bug Something isn't working label Aug 29, 2024

NickLucche mentioned this issue Aug 30, 2024

[Bugfix] Fix ModelScope models in v0.5.5 #8037

Merged

youkaichao closed this as completed in #8037 Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: vllm0.5.5 Ignores VLLM_USE_MODELSCOPE=True and Accesses huggingface.co #7986

[Bug]: vllm0.5.5 Ignores VLLM_USE_MODELSCOPE=True and Accesses huggingface.co #7986

NaiveYan commented Aug 29, 2024

simon-mo commented Aug 29, 2024

NickLucche commented Aug 30, 2024

[Bug]: vllm0.5.5 Ignores VLLM_USE_MODELSCOPE=True and Accesses huggingface.co #7986

[Bug]: vllm0.5.5 Ignores VLLM_USE_MODELSCOPE=True and Accesses huggingface.co #7986

Comments

NaiveYan commented Aug 29, 2024

Your current environment

🐛 Describe the bug

Before submitting a new issue...

simon-mo commented Aug 29, 2024

NickLucche commented Aug 30, 2024