Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[beam search] add output for manually checking the correctness #8684

Merged
merged 2 commits into from
Sep 21, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions tests/samplers/test_beam_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# 3. Use the model "huggyllama/llama-7b".
MAX_TOKENS = [128]
BEAM_WIDTHS = [4]
MODELS = ["facebook/opt-125m"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find facebook/opt-125m only repeats itself. The output of TinyLlama/TinyLlama-1.1B-Chat-v1.0 is better, although it is still not very sensible.

[2024-09-21T01:57:27Z] >>>0-th hf output:
--
  | [2024-09-21T01:57:27Z] vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 2. OpenAI GPT-3: OpenAI GPT-3 is a language model pre-trained on a large corpus of text. It can generate human-like text and has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 3. GPT-Neo: GPT-Neo is an improved version of GPT-3 that has been trained on a larger corpus of text. It has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 4. GPT-J: GPT-J is a
  | [2024-09-21T01:57:27Z] >>>0-th vllm output:
  | [2024-09-21T01:57:27Z] vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 2. OpenAI GPT-3: OpenAI GPT-3 is a language model pre-trained on a large corpus of text. It can generate human-like text and has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 3. GPT-Neo: GPT-Neo is an improved version of GPT-3 that has been trained on a larger corpus of text. It has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 4. GPT-J: GPT-J is a
  | [2024-09-21T01:57:27Z] >>>1-th hf output:
  | [2024-09-21T01:57:27Z] vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 2. OpenAI GPT-3: OpenAI GPT-3 is a language model pre-trained on a large corpus of text. It can generate human-like text and has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 3. GPT-Neo: GPT-Neo is an improved version of GPT-3 that has been trained on a larger corpus of text. It has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 4. GPT-J: GPT-J is an
  | [2024-09-21T01:57:27Z] >>>1-th vllm output:
  | [2024-09-21T01:57:27Z] vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 2. OpenAI GPT-3: OpenAI GPT-3 is a language model pre-trained on a large corpus of text. It can generate human-like text and has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 3. GPT-Neo: GPT-Neo is an improved version of GPT-3 that has been trained on a larger corpus of text. It has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 4. GPT-J: GPT-J is an
  | [2024-09-21T01:57:27Z] >>>2-th hf output:
  | [2024-09-21T01:57:27Z] vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 2. OpenAI GPT-3: OpenAI GPT-3 is a language model pre-trained on a large corpus of text. It can generate human-like text and has been used in various applications, such as chatbots, language translation, and text generation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 3. GPT-Neo: GPT-Neo is an improved version of GPT-3 that has been trained on a larger corpus of text. It has been used in various applications, such as chatbots, language translation, and text generation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 4. GPT-J: GPT
  | [2024-09-21T01:57:27Z] >>>2-th vllm output:
  | [2024-09-21T01:57:27Z] vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 2. OpenAI GPT-3: OpenAI GPT-3 is a language model pre-trained on a large corpus of text. It can generate human-like text and has been used in various applications, such as chatbots, language translation, and text generation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 3. GPT-Neo: GPT-Neo is an improved version of GPT-3 that has been trained on a larger corpus of text. It has been used in various applications, such as chatbots, language translation, and text generation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 4. GPT-J: GPT
  | [2024-09-21T01:57:27Z] >>>3-th hf output:
  | [2024-09-21T01:57:27Z] vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 2. OpenAI GPT-3: OpenAI GPT-3 is a language model pre-trained on a large corpus of text. It can generate human-like text and has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 3. GPT-Neo: GPT-Neo is an improved version of GPT-3 that has been trained on a larger corpus of text. It has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 4. T5: T5 is a language model pre-
  | [2024-09-21T01:57:27Z] >>>3-th vllm output:
  | [2024-09-21T01:57:27Z] vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 2. OpenAI GPT-3: OpenAI GPT-3 is a language model pre-trained on a large corpus of text. It can generate human-like text and has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 3. GPT-Neo: GPT-Neo is an improved version of GPT-3 that has been trained on a larger corpus of text. It has been used in various applications such as chatbots, text generation, and translation.
  | [2024-09-21T01:57:27Z]
  | [2024-09-21T01:57:27Z] 4. T5: T5 is a language model pre-


MODELS = ["TinyLlama/TinyLlama-1.1B-Chat-v1.0"]


@pytest.mark.parametrize("model", MODELS)
Expand All @@ -37,8 +37,15 @@ def test_beam_search_single_input(
beam_width, max_tokens)

for i in range(len(example_prompts)):
hf_output_ids, _ = hf_outputs[i]
vllm_output_ids, _ = vllm_outputs[i]
hf_output_ids, hf_output_texts = hf_outputs[i]
vllm_output_ids, vllm_output_texts = vllm_outputs[i]
for i, (hf_text,
vllm_text) in enumerate(zip(hf_output_texts,
vllm_output_texts)):
print(f">>>{i}-th hf output:")
print(hf_text)
print(f">>>{i}-th vllm output:")
print(vllm_text)
assert len(hf_output_ids) == len(vllm_output_ids)
for j in range(len(hf_output_ids)):
assert hf_output_ids[j] == vllm_output_ids[j], (
Expand Down
Loading