Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix test breakages from transformers 4.45 upgrade #8829

Merged
merged 24 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
4f53397
[BugFix] Fix test breakages from transformers 4.45 upgrade
njhill Sep 26, 2024
e2ae1bb
Also fix llava OOM from @ywang96
njhill Sep 26, 2024
66c0c19
Fix next failures
njhill Sep 26, 2024
a5b289c
Catch any Exception when attempting to load lora-specific tokenizer
njhill Sep 26, 2024
ce1d477
Change "default" rope scaling type back to "mrope" in HF config
njhill Sep 26, 2024
4eaa8e1
raise gpu mem
ywang96 Sep 26, 2024
899003b
Merge branch 'main' into transformers-fixes
DarkLight1337 Sep 26, 2024
562f816
Remove unnecessary overwrite
DarkLight1337 Sep 26, 2024
51b9abc
Remove unnecessary version guards
DarkLight1337 Sep 26, 2024
8e7f2b6
Update A100 distributed test with new file location (missed in #7820)
DarkLight1337 Sep 26, 2024
57b7328
Replace legacy `tmpdir` with modern `tmp_path` fixture
DarkLight1337 Sep 26, 2024
0ebd4fb
Reduce max_model_len in LLaVA-OneVision test to avoid OOM
DarkLight1337 Sep 26, 2024
4a924c8
Patch `ChatGLMTokenizer._pad`
DarkLight1337 Sep 26, 2024
0c30e87
Run OOT test in a clean process to solve OOM in AMD
DarkLight1337 Sep 26, 2024
9f2fac8
Fix insufficient `max_model_len`
DarkLight1337 Sep 26, 2024
2b6948c
Fix wrong test being updated
DarkLight1337 Sep 26, 2024
45e2b54
Cleanup
DarkLight1337 Sep 26, 2024
f0584fa
raise mem
ywang96 Sep 26, 2024
27b96c1
format
ywang96 Sep 26, 2024
cd105be
Merge remote-tracking branch 'upstream/main' into transformers-fixes
ywang96 Sep 26, 2024
315ff90
remove comment
ywang96 Sep 26, 2024
8fdad1c
skip test
ywang96 Sep 26, 2024
6decd70
revert soft fail
ywang96 Sep 26, 2024
59bc78d
Update tokenizer patch
DarkLight1337 Sep 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions tests/samplers/test_sampler.py
Original file line number Diff line number Diff line change
Expand Up @@ -596,8 +596,12 @@ def test_sampler_top_k_top_p(seed: int, device: str):
generation_config = GenerationConfig(top_k=top_k,
top_p=top_p,
do_sample=True)
warpers = generation_model._get_logits_warper(generation_config, device)
assert len(warpers) == 2 # top_p and top_k
processors = generation_model._get_logits_processor(generation_config,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_logits_warper was rolled into _get_logits_processor

None,
None,
None, [],
device=device)
assert len(processors) == 2 # top_p and top_k

seq_group_metadata_list: List[SequenceGroupMetadata] = []
seq_lens: List[int] = []
Expand Down Expand Up @@ -639,7 +643,7 @@ def mock_sample(probs, *args, **kwargs):

assert sample_probs is not None

hf_probs = warpers(torch.zeros_like(fake_logits), fake_logits.clone())
hf_probs = processors(torch.zeros_like(fake_logits), fake_logits.clone())
hf_probs = torch.softmax(hf_probs, dim=-1, dtype=torch.float)
torch.testing.assert_close(hf_probs, sample_probs, rtol=0.0, atol=1e-5)
assert torch.equal(hf_probs.eq(0), sample_probs.eq(0))
Expand Down
2 changes: 1 addition & 1 deletion vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -1740,7 +1740,7 @@ def _get_and_verify_max_len(
"with rope_scaling. Please raise an issue so we can "
"investigate.")

if rope_type == "mrope":
if rope_type in ("mrope", "default"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mrope" gets renamed to "default" in the Qwen2-VL config class

Copy link
Member

@DarkLight1337 DarkLight1337 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen2-VL cannot be run in transformers>=4.45 even with this change.

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/cyrusleung/vllm/examples/offline_inference_vision_language_multi_image.py", line 286, in <module>
[rank0]:     main(args)
[rank0]:   File "/home/cyrusleung/vllm/examples/offline_inference_vision_language_multi_image.py", line 262, in main
[rank0]:     run_generate(model, QUESTION, IMAGE_URLS)
[rank0]:   File "/home/cyrusleung/vllm/examples/offline_inference_vision_language_multi_image.py", line 205, in run_generate
[rank0]:     req_data = model_example_map[model](question, image_urls)
[rank0]:   File "/home/cyrusleung/vllm/examples/offline_inference_vision_language_multi_image.py", line 151, in load_qwen2_vl
[rank0]:     llm = LLM(
[rank0]:   File "/home/cyrusleung/vllm/vllm/entrypoints/llm.py", line 214, in __init__
[rank0]:     self.llm_engine = LLMEngine.from_engine_args(
[rank0]:   File "/home/cyrusleung/vllm/vllm/engine/llm_engine.py", line 564, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/home/cyrusleung/vllm/vllm/engine/llm_engine.py", line 325, in __init__
[rank0]:     self.model_executor = executor_class(
[rank0]:   File "/home/cyrusleung/vllm/vllm/executor/executor_base.py", line 47, in __init__
[rank0]:     self._init_executor()
[rank0]:   File "/home/cyrusleung/vllm/vllm/executor/gpu_executor.py", line 40, in _init_executor
[rank0]:     self.driver_worker.load_model()
[rank0]:   File "/home/cyrusleung/vllm/vllm/worker/worker.py", line 183, in load_model
[rank0]:     self.model_runner.load_model()
[rank0]:   File "/home/cyrusleung/vllm/vllm/worker/model_runner.py", line 1016, in load_model
[rank0]:     self.model = get_model(model_config=self.model_config,
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
[rank0]:     return loader.load_model(model_config=model_config,
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/model_loader/loader.py", line 399, in load_model
[rank0]:     model = _initialize_model(model_config, self.load_config,
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/model_loader/loader.py", line 176, in _initialize_model
[rank0]:     return build_model(
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/model_loader/loader.py", line 161, in build_model
[rank0]:     return model_class(config=hf_config,
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/models/qwen2_vl.py", line 876, in __init__
[rank0]:     self.model = Qwen2Model(config, cache_config, quant_config)
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/models/qwen2.py", line 248, in __init__
[rank0]:     self.start_layer, self.end_layer, self.layers = make_layers(
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/models/utils.py", line 282, in make_layers
[rank0]:     [PPMissingLayer() for _ in range(start_layer)] + [
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/models/utils.py", line 283, in <listcomp>
[rank0]:     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/models/qwen2.py", line 250, in <lambda>
[rank0]:     lambda prefix: Qwen2DecoderLayer(config=config,
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/models/qwen2.py", line 175, in __init__
[rank0]:     self.self_attn = Qwen2Attention(
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/models/qwen2.py", line 133, in __init__
[rank0]:     self.rotary_emb = get_rope(
[rank0]:   File "/home/cyrusleung/vllm/vllm/model_executor/layers/rotary_embedding.py", line 1003, in get_rope
[rank0]:     raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
[rank0]: ValueError: Unknown RoPE scaling type default

But if we install the older version of transformers mentioned in the docs (git+https://github.com/huggingface/transformers.git@21fac7abba2a37fae86106f87fcf9974fd1e3830), then vLLM cannot be run because it imports MLlamaConfig from the top level. We'll open a separate PR to patch in Qwen2-VL support for transformers v4.45.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DarkLight1337 yeah I was just making a change to update the scaling type in the config back to "mrope" if it's "default". @ywang96 found this open issue huggingface/transformers#33401

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change made in ce1d477

scaling_factor = 1
else:
assert "factor" in rope_scaling
Expand Down
2 changes: 1 addition & 1 deletion vllm/transformers_utils/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def get_lora_tokenizer(lora_request: LoRARequest, *args,
return None
try:
tokenizer = get_tokenizer(lora_request.lora_path, *args, **kwargs)
except OSError as e:
except (OSError, ValueError) as e:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError rather than OSError is now thrown when there's no config.json present

# No tokenizer was found in the LoRA folder,
# use base model tokenizer
logger.warning(
Expand Down
Loading