Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add google/gemini-1.5-pro-preview-0409 as VLM #2607

Merged
merged 7 commits into from
May 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/helm/benchmark/model_metadata_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@

GOOGLE_PALM_2_MODEL_TAG: str = "GOOGLE_PALM_2_MODEL_TAG"
GOOGLE_GEMINI_MODEL_TAG: str = "GOOGLE_GEMINI_MODEL_TAG"
GOOGLE_GEMINI_PRO_VISION_V1_TAG: str = "GOOGLE_GEMINI_PRO_VISION_V1_TAG"
GOOGLE_GEMMA_INSTRUCT_MODEL_TAG: str = "GOOGLE_GEMMA_INSTRUCT_MODEL_TAG"

# Models which emit garbage tokens when temperature=0.
Expand Down
6 changes: 3 additions & 3 deletions src/helm/benchmark/run_spec_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
ANTHROPIC_CLAUDE_3_MODEL_TAG,
BUGGY_TEMP_0_TAG,
CHATML_MODEL_TAG,
GOOGLE_GEMINI_MODEL_TAG,
GOOGLE_GEMINI_PRO_VISION_V1_TAG,
IDEFICS_INSTRUCT_MODEL_TAG,
LLAVA_MODEL_TAG,
OPEN_FLAMINGO_MODEL_TAG,
Expand Down Expand Up @@ -130,10 +130,10 @@ def alter_run_spec(run_spec: RunSpec) -> RunSpec:
if ANTHROPIC_CLAUDE_3_MODEL_TAG in model.tags:
run_spec = singleton(AnthropicClaude3RunExpander().expand(run_spec))

# Google Gemini Vision returns an empty completion or throws an error if max_tokens is 1
# Google Gemini Vision v1.0 returns an empty completion or throws an error if max_tokens is 1
if (
VISION_LANGUAGE_MODEL_TAG in model.tags
and GOOGLE_GEMINI_MODEL_TAG in model.tags
and GOOGLE_GEMINI_PRO_VISION_V1_TAG in model.tags
and run_spec.adapter_spec.max_tokens == 1
):
run_spec = singleton(IncreaseMaxTokensRunExpander(value=1).expand(run_spec))
Expand Down
9 changes: 9 additions & 0 deletions src/helm/config/model_deployments.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1357,6 +1357,15 @@ model_deployments:
client_spec:
class_name: "helm.clients.openai_client.OpenAIClient"

- name: openai/gpt-4-1106-vision-preview
model_name: openai/gpt-4-1106-vision-preview
tokenizer_name: openai/cl100k_base
max_sequence_length: 128000 # According to https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo
max_request_length: 128001
max_sequence_and_generated_tokens_length: 132096
client_spec:
class_name: "helm.clients.openai_client.OpenAIClient"

## Codex Models
# DEPRECATED: Codex models have been shut down on March 23 2023.

Expand Down
17 changes: 13 additions & 4 deletions src/helm/config/model_metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -695,15 +695,15 @@ models:
creator_organization_name: Google
access: limited
release_date: 2023-12-13
tags: [VISION_LANGUAGE_MODEL_TAG, GOOGLE_GEMINI_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]
tags: [VISION_LANGUAGE_MODEL_TAG, GOOGLE_GEMINI_MODEL_TAG, GOOGLE_GEMINI_PRO_VISION_V1_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG]

- name: google/gemini-1.5-pro-preview-0409
display_name: Gemini 1.5 Pro
display_name: Gemini 1.5 Pro (0409 preview)
description: Gemini 1.5 Pro is a multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from long contexts. ([paper](https://arxiv.org/abs/2403.05530))
creator_organization_name: Google
access: limited
release_date: 2024-04-10
tags: [TEXT_MODEL_TAG, GOOGLE_GEMINI_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]
tags: [TEXT_MODEL_TAG, VISION_LANGUAGE_MODEL_TAG, GOOGLE_GEMINI_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]

- name: google/gemma-2b
display_name: Gemma (2B)
Expand Down Expand Up @@ -1765,7 +1765,16 @@ models:
tags: [TEXT_MODEL_TAG, OPENAI_CHATGPT_MODEL_TAG, LIMITED_FUNCTIONALITY_TEXT_MODEL_TAG, INSTRUCTION_FOLLOWING_MODEL_TAG]

- name: openai/gpt-4-vision-preview
display_name: GPT-4V (preview)
# According to https://platform.openai.com/docs/models/gpt-4-turbo-and-gpt-4, this model has pointed gpt-4-1106-vision-preview.
display_name: GPT-4V (1106 preview)
Copy link
Member Author

@teetone teetone May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Date GPT-4V preview

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to get out of date in the future... "GPT-4V (preview)" is better.

description: GPT-4V is a large multimodal model that accepts both text and images and is optimized for chat ([model card](https://openai.com/research/gpt-4v-system-card)).
creator_organization_name: OpenAI
access: limited
release_date: 2023-11-06
tags: [VISION_LANGUAGE_MODEL_TAG, OPENAI_CHATGPT_MODEL_TAG, FULL_FUNCTIONALITY_VLM_TAG]

- name: openai/gpt-4-1106-vision-preview
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use this checkpoint until it's no longer preview going forward.

display_name: GPT-4V (1106 preview)
description: GPT-4V is a large multimodal model that accepts both text and images and is optimized for chat ([model card](https://openai.com/research/gpt-4v-system-card)).
creator_organization_name: OpenAI
access: limited
Expand Down