[Bug]: gemini 2.0 Flash GA is erroring due to context caching not being supported #8296

trashhalo · 2025-02-06T02:22:32Z

What happened?

Trying to use the new gemini 2.0 flash that launched today and its erroring saying the model doesnt exist or it doesnt support createCachedContent.

model_list:
  - model_name: "anthropic/*"
    litellm_params:
      model: "anthropic/*"
      api_key: "os.environ/ANTHROPIC_API_KEY"

  - model_name: "gemini/*"
    litellm_params:
      model: "gemini/*"
      api_key: "os.environ/GEMINI_API_KEY"

general_settings:
  telemetry: False

server:
  environment: development
  release_track: "stable"
  port: 4002
  host: 0.0.0.0

Relevant log output

litellm-1        | Received Model Group=gemini/gemini-2.0-flash-001
litellm-1        | Available Model Group Fallbacks=None
litellm-1        | Traceback (most recent call last):
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py", line 396, in async_check_and_create_cache
litellm-1        |     response = await client.post(
litellm-1        |                ^^^^^^^^^^^^^^^^^^
litellm-1        |         url=url, headers=headers, json=cached_content_request_body  # type: ignore
litellm-1        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm-1        |     )
litellm-1        |     ^
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/custom_httpx/http_handler.py", line 219, in post
litellm-1        |     raise e
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/custom_httpx/http_handler.py", line 177, in post
litellm-1        |     response.raise_for_status()
litellm-1        |     ~~~~~~~~~~~~~~~~~~~~~~~~~^^
litellm-1        |   File "/usr/lib/python3.13/site-packages/httpx/_models.py", line 761, in raise_for_status
litellm-1        |     raise HTTPStatusError(message, request=request, response=self)
litellm-1        | httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://generativelanguage.googleapis.com/v1beta/cachedContents?key='
litellm-1        | For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
litellm-1        |
litellm-1        | During handling of the above exception, another exception occurred:
litellm-1        |
litellm-1        | Traceback (most recent call last):
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/main.py", line 447, in acompletion
litellm-1        |     response = await init_response
litellm-1        |                ^^^^^^^^^^^^^^^^^^^
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py", line 1057, in async_completion
litellm-1        |     request_body = await async_transform_request_body(**data)  # type: ignore
litellm-1        |                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/gemini/transformation.py", line 408, in async_transform_request_body
litellm-1        |     await context_caching_endpoints.async_check_and_create_cache(
litellm-1        |     ...<9 lines>...
litellm-1        |     )
litellm-1        |   File "/usr/lib/python3.13/site-packages/litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py", line 402, in async_check_and_create_cache
litellm-1        |     raise VertexAIError(status_code=error_code, message=err.response.text)
litellm-1        | litellm.llms.vertex_ai.common_utils.VertexAIError: {
litellm-1        |   "error": {
litellm-1        |     "code": 404,
litellm-1        |     "message": "models/gemini-2.0-flash-001 is not found for API version v1beta, or is not supported for createCachedContent. Call ListModels to see the list of available models and their supported methods.",
litellm-1        |     "status": "NOT_FOUND"
litellm-1        |   }
litellm-1        | }

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.57.8

Twitter / LinkedIn details

No response

trashhalo · 2025-02-06T02:29:57Z

I believe the issue is that 2.0 flash ga does not support context caching. inferred from the "coming soon" note on cached pricing. Is it possible to turn this off and use without context caching?

trashhalo · 2025-02-06T02:58:29Z

Work around is to access gemini via vertex_ai/gemini-2.0-flash-001 since it doesnt seem to trip over the same context caching code.

trashhalo added the bug Something isn't working label Feb 6, 2025

trashhalo changed the title ~~[Bug]: gemini 2.0 Flash GA is erroring~~ [Bug]: gemini 2.0 Flash GA is erroring Feb 6, 2025

trashhalo changed the title ~~[Bug]: gemini 2.0 Flash GA is erroring~~ [Bug]: gemini 2.0 Flash GA is erroring due to context caching not being supported Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: gemini 2.0 Flash GA is erroring due to context caching not being supported #8296

[Bug]: gemini 2.0 Flash GA is erroring due to context caching not being supported #8296

trashhalo commented Feb 6, 2025 •

edited

Loading

trashhalo commented Feb 6, 2025 •

edited

Loading

trashhalo commented Feb 6, 2025

[Bug]: gemini 2.0 Flash GA is erroring due to context caching not being supported #8296

[Bug]: gemini 2.0 Flash GA is erroring due to context caching not being supported #8296

Comments

trashhalo commented Feb 6, 2025 • edited Loading

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

trashhalo commented Feb 6, 2025 • edited Loading

trashhalo commented Feb 6, 2025

trashhalo commented Feb 6, 2025 •

edited

Loading

trashhalo commented Feb 6, 2025 •

edited

Loading