Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: gemini 2.0 Flash GA is erroring due to context caching not being supported #8296

Open
trashhalo opened this issue Feb 6, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@trashhalo
Copy link

trashhalo commented Feb 6, 2025

What happened?

Trying to use the new gemini 2.0 flash that launched today and its erroring saying the model doesnt exist or it doesnt support createCachedContent.

model_list:
  - model_name: "anthropic/*"
    litellm_params:
      model: "anthropic/*"
      api_key: "os.environ/ANTHROPIC_API_KEY"

  - model_name: "gemini/*"
    litellm_params:
      model: "gemini/*"
      api_key: "os.environ/GEMINI_API_KEY"

general_settings:
  telemetry: False

server:
  environment: development
  release_track: "stable"
  port: 4002
  host: 0.0.0.0

Relevant log output

litellm-1        | Received Model Group=gemini/gemini-2.0-flash-001
litellm-1        | Available Model Group Fallbacks=None
litellm-1        | Traceback (most recent call last):
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py", line 396, in async_check_and_create_cache
litellm-1        |     response = await client.post(
litellm-1        |                ^^^^^^^^^^^^^^^^^^
litellm-1        |         url=url, headers=headers, json=cached_content_request_body  # type: ignore
litellm-1        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm-1        |     )
litellm-1        |     ^
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/custom_httpx/http_handler.py", line 219, in post
litellm-1        |     raise e
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/custom_httpx/http_handler.py", line 177, in post
litellm-1        |     response.raise_for_status()
litellm-1        |     ~~~~~~~~~~~~~~~~~~~~~~~~~^^
litellm-1        |   File "/usr/lib/python3.13/site-packages/httpx/_models.py", line 761, in raise_for_status
litellm-1        |     raise HTTPStatusError(message, request=request, response=self)
litellm-1        | httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://generativelanguage.googleapis.com/v1beta/cachedContents?key='
litellm-1        | For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
litellm-1        |
litellm-1        | During handling of the above exception, another exception occurred:
litellm-1        |
litellm-1        | Traceback (most recent call last):
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/main.py", line 447, in acompletion
litellm-1        |     response = await init_response
litellm-1        |                ^^^^^^^^^^^^^^^^^^^
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py", line 1057, in async_completion
litellm-1        |     request_body = await async_transform_request_body(**data)  # type: ignore
litellm-1        |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/gemini/transformation.py", line 408, in async_transform_request_body
litellm-1        |     await context_caching_endpoints.async_check_and_create_cache(
litellm-1        |     ...<9 lines>...
litellm-1        |     )
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py", line 402, in async_check_and_create_cache
litellm-1        |     raise VertexAIError(status_code=error_code, message=err.response.text)
litellm-1        | litellm.llms.vertex_ai.common_utils.VertexAIError: {
litellm-1        |   "error": {
litellm-1        |     "code": 404,
litellm-1        |     "message": "models/gemini-2.0-flash-001 is not found for API version v1beta, or is not supported for createCachedContent. Call ListModels to see the list of available models and their supported methods.",
litellm-1        |     "status": "NOT_FOUND"
litellm-1        |   }
litellm-1        | }

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.57.8

Twitter / LinkedIn details

No response

@trashhalo trashhalo added the bug Something isn't working label Feb 6, 2025
@trashhalo trashhalo changed the title [Bug]: gemini 2.0 Flash GA is erroring [Bug]: gemini 2.0 Flash GA is erroring Feb 6, 2025
@trashhalo
Copy link
Author

trashhalo commented Feb 6, 2025

I believe the issue is that 2.0 flash ga does not support context caching. inferred from the "coming soon" note on cached pricing. Is it possible to turn this off and use without context caching?

@trashhalo trashhalo changed the title [Bug]: gemini 2.0 Flash GA is erroring [Bug]: gemini 2.0 Flash GA is erroring due to context caching not being supported Feb 6, 2025
@trashhalo
Copy link
Author

Work around is to access gemini via vertex_ai/gemini-2.0-flash-001 since it doesnt seem to trip over the same context caching code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant