Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Loop when background_health_checks is set to true #8248

Open
luismarquezgft opened this issue Feb 4, 2025 · 1 comment · May be fixed by #8454
Open

[Bug]: Loop when background_health_checks is set to true #8248

luismarquezgft opened this issue Feb 4, 2025 · 1 comment · May be fixed by #8454

Comments

@luismarquezgft
Copy link

What happened?

When I set background_health_checks to true in this configuration file and run the docker container, it enters a loop checking the models. See the relevant log output for more information.

docker run     -v $(pwd)/config.yaml:/app/config.yaml     --env-file .env     -p 4000:4000     ghcr.io/berriai/litellm:main-latest     --config /app/config.yaml --detailed_debug

config.yaml

model_list:
  - model_name: os.environ/AZURE_OPENAI_MODEL
    litellm_params:
      model: os.environ/AZURE_OPENAI_DEPLOYMENT
      api_base: os.environ/AZURE_OPENAI_API_BASE
      api_key: "os.environ/AZURE_OPENAI_API_KEY"
      api_version: "os.environ/AZURE_OPENAI_API_VERSION" # [OPTIONAL] litellm uses the latest azure api_version by default
      rpm: 50
    model_info:
      mode: completion # This setting is used to determine how to check the health of the model. See https://docs.litellm.ai/docs/proxy/health
      input_cost_per_token: 0.000002399240
      output_cost_per_token: 0.000009597000
      max_tokens: 16384
  - model_name: gpt-4o-mini
    litellm_params:
      model: azure/gpt-4o-mini
      api_base: os.environ/AZURE_OPENAI_API_BASE
      api_key: "os.environ/AZURE_OPENAI_API_KEY"
      api_version: "os.environ/AZURE_OPENAI_API_VERSION" # [OPTIONAL] litellm uses the latest azure api_version by default
      rpm: 50
    model_info:
      mode: completion
      input_cost_per_token: 0.000000143960
      output_cost_per_token: 0.000000575900
      max_tokens: 16384      

litellm_settings:
    ssl_verify: true   # [OPTIONAL] Set to False to disable SSL verification (not recommended)
    request_timeout: 30 # (int) llm requesttimeout in seconds. Raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 
etectionredaction

general_settings:

  # Parallelism
  max_parallel_requests: 5  # the max parallel requests allowed per deployment 
  global_max_parallel_requests: 100  # the max parallel requests allowed on the proxy all up 

  # Health and monitorization. https://docs.litellm.ai/docs/proxy/health
  background_health_checks: true # Uses model_info.mode to determine how to check the health of each model
  health_check_interval: 300 # frequency of background health checks
  health_check_details: false # If false, hides health check details (e.g. remaining rate limit)

Relevant log output

15:43:35 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:35 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:35 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:35 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:35 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:35 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 2.8380000000000003e-05
15:43:35 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o-mini; cost=2.8380000000000003e-05
15:43:35 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:35 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:35 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:35 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:35 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:35 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'
15:43:36 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
{"id": "chatcmpl-AxFNHjFAjyjVppwfVIkf9iIhhSh4R", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "It looks like you are referring to LiteLLM, which is a lightweight and efficient implementation for LLM-related tasks. However, without specific context, it's a bit unclear what exactly you're testing for.\n\nAre you looking to:\n1. Test the performance of LiteLLM on specific tasks?\n2. Check its compatibility with your project?\n3. Benchmark it against other LLM implementations?\n\nPlease provide more details so I can assist you better!", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null}, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}], "created": 1738683815, "model": "gpt-4o-2024-05-13", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_65792305e4", "usage": {"completion_tokens": 88, "prompt_tokens": 12, "total_tokens": 100, "completion_tokens_details": null, "prompt_tokens_details": null}, "prompt_filter_results": [{"prompt_index": 0, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}]}


15:43:36 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:36 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00138
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f0501360b00>>
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Logging Details LiteLLM-Async Success Call, cache_hit=None
15:43:36 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00138
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:36 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00138
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o; cost=0.00138
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:36 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:36 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:36 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'
15:43:36 - LiteLLM:DEBUG: utils.py:298 - 

15:43:36 - LiteLLM:DEBUG: utils.py:298 - Request to litellm:
15:43:36 - LiteLLM:DEBUG: utils.py:298 - litellm.atext_completion(rpm=50, api_key='c36fcccfb77e4b60bca131212fec5b58', api_base='https://mlopstools-oai-eus1-dev-001.openai.azure.com/', api_version='2024-10-21', use_in_pass_through=False, model='azure/gpt-4o', cache={'no-cache': True}, prompt='test from litellm')
15:43:36 - LiteLLM:DEBUG: utils.py:298 - 

15:43:36 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f0503cd1400>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x7f0504a1d7f0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f0504a1de80>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f0504a1dfd0>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f0504a1e120>, <litellm._service_logger.ServiceLogging object at 0x7f050490c550>]
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {}
15:43:36 - LiteLLM:DEBUG: utils.py:298 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): {'no-cache': True}
15:43:36 - LiteLLM:DEBUG: caching_handler.py:212 - CACHE RESULT: None
15:43:36 - LiteLLM:DEBUG: utils.py:298 - 

15:43:36 - LiteLLM:DEBUG: utils.py:298 - Request to litellm:
15:43:36 - LiteLLM:DEBUG: utils.py:298 - litellm.atext_completion(rpm=50, api_key='c36fcccfb77e4b60bca131212fec5b58', api_base='https://mlopstools-oai-eus1-dev-001.openai.azure.com/', api_version='2024-10-21', use_in_pass_through=False, model='azure/gpt-4o-mini', cache={'no-cache': True}, prompt='test from litellm')
15:43:36 - LiteLLM:DEBUG: utils.py:298 - 

15:43:36 - LiteLLM:INFO: utils.py:2894 - 
LiteLLM completion() model= gpt-4o; provider = azure
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f0503cd1400>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x7f0504a1d7f0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f0504a1de80>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f0504a1dfd0>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f0504a1e120>, <litellm._service_logger.ServiceLogging object at 0x7f050490c550>]
15:43:36 - LiteLLM:DEBUG: utils.py:2897 - 
LiteLLM: Params passed to completion() {'model': 'gpt-4o', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'azure', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': '2024-10-21', 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'test from litellm'}]}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {}
15:43:36 - LiteLLM:DEBUG: utils.py:2900 - 
LiteLLM: Non-Default params passed to completion() {}
15:43:36 - LiteLLM:DEBUG: utils.py:298 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): {'no-cache': True}
15:43:36 - LiteLLM:DEBUG: utils.py:3500 - Azure optional params - api_version: api_version=2024-10-21, litellm.api_version=None, os.environ['AZURE_API_VERSION']=2024-07-01-preview
15:43:36 - LiteLLM:DEBUG: caching_handler.py:212 - CACHE RESULT: None
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Final returned optional params: {'extra_body': {}}
15:43:36 - LiteLLM:INFO: utils.py:2894 - 
LiteLLM completion() model= gpt-4o-mini; provider = azure
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {'extra_body': {}}
15:43:36 - LiteLLM:DEBUG: utils.py:2897 - 
LiteLLM: Params passed to completion() {'model': 'gpt-4o-mini', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'azure', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': '2024-10-21', 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'test from litellm'}]}
15:43:36 - LiteLLM:DEBUG: utils.py:2900 - 
LiteLLM: Non-Default params passed to completion() {}
15:43:36 - LiteLLM:DEBUG: utils.py:3500 - Azure optional params - api_version: api_version=2024-10-21, litellm.api_version=None, os.environ['AZURE_API_VERSION']=2024-07-01-preview
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Final returned optional params: {'extra_body': {}}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {'extra_body': {}}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:634 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://mlopstools-oai-eus1-dev-001.openai.azure.com//openai/deployments/gpt-4o/ \
-H 'api_key: *****' -H 'azure_ad_token: *****' \
-d '{'model': 'gpt-4o', 'messages': [{'role': 'user', 'content': 'test from litellm'}], 'extra_body': {}}'


15:43:36 - LiteLLM:DEBUG: litellm_logging.py:634 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://mlopstools-oai-eus1-dev-001.openai.azure.com//openai/deployments/gpt-4o-mini/ \
-H 'api_key: *****' -H 'azure_ad_token: *****' \
-d '{'model': 'gpt-4o-mini', 'messages': [{'role': 'user', 'content': 'test from litellm'}], 'extra_body': {}}'


15:43:37 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
{"id": "chatcmpl-AxFNIe7ozB0AiTI8dXYoFVvwJAzwE", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "It looks like you might be testing a system or tool related to \"litellm.\" How can I assist you with that? If you have specific questions or need information, feel free to ask!", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null}, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}], "created": 1738683816, "model": "gpt-4o-mini-2024-07-18", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_f3927aa00d", "usage": {"completion_tokens": 40, "prompt_tokens": 12, "total_tokens": 52, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}, "prompt_filter_results": [{"prompt_index": 0, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "jailbreak": {"filtered": false, "detected": false}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}]}


15:43:37 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 2.8380000000000003e-05
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f0501363e10>>
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Logging Details LiteLLM-Async Success Call, cache_hit=None
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 2.8380000000000003e-05
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 2.8380000000000003e-05
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o-mini; cost=2.8380000000000003e-05
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:37 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:37 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:37 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'
15:43:37 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
{"id": "chatcmpl-AxFNIvr5aOtwHwyD3HvHFBHdxfyXe", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "It seems like you might want to test a functionality or have a question about \"litellm\". Could you please provide more context or clarify your request? I'd be happy to help with whatever information or assistance you need.", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null}, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}], "created": 1738683816, "model": "gpt-4o-2024-05-13", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_f3927aa00d", "usage": {"completion_tokens": 44, "prompt_tokens": 12, "total_tokens": 56, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}, "prompt_filter_results": [{"prompt_index": 0, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}]}


15:43:37 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00072
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f0503d23bb0>>
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Logging Details LiteLLM-Async Success Call, cache_hit=None
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00072
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00072
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o; cost=0.00072
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:37 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:37 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:37 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'
15:43:37 - LiteLLM:DEBUG: utils.py:298 - 

15:43:37 - LiteLLM:DEBUG: utils.py:298 - Request to litellm:
15:43:37 - LiteLLM:DEBUG: utils.py:298 - litellm.atext_completion(rpm=50, api_key='c36fcccfb77e4b60bca131212fec5b58', api_base='https://mlopstools-oai-eus1-dev-001.openai.azure.com/', api_version='2024-10-21', use_in_pass_through=False, model='azure/gpt-4o', cache={'no-cache': True}, prompt='test from litellm')
15:43:37 - LiteLLM:DEBUG: utils.py:298 - 

15:43:37 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f0503cd1400>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x7f0504a1d7f0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f0504a1de80>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f0504a1dfd0>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f0504a1e120>, <litellm._service_logger.ServiceLogging object at 0x7f050490c550>]
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): {'no-cache': True}
15:43:37 - LiteLLM:DEBUG: caching_handler.py:212 - CACHE RESULT: None
15:43:37 - LiteLLM:DEBUG: utils.py:298 - 

15:43:37 - LiteLLM:INFO: utils.py:2894 - 
LiteLLM completion() model= gpt-4o; provider = azure
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Request to litellm:
15:43:37 - LiteLLM:DEBUG: utils.py:2897 - 
LiteLLM: Params passed to completion() {'model': 'gpt-4o', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'azure', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': '2024-10-21', 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'test from litellm'}]}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - litellm.atext_completion(rpm=50, api_key='c36fcccfb77e4b60bca131212fec5b58', api_base='https://mlopstools-oai-eus1-dev-001.openai.azure.com/', api_version='2024-10-21', use_in_pass_through=False, model='azure/gpt-4o-mini', cache={'no-cache': True}, prompt='test from litellm')
15:43:37 - LiteLLM:DEBUG: utils.py:2900 - 
LiteLLM: Non-Default params passed to completion() {}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - 

15:43:37 - LiteLLM:DEBUG: utils.py:3500 - Azure optional params - api_version: api_version=2024-10-21, litellm.api_version=None, os.environ['AZURE_API_VERSION']=2024-07-01-preview
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f0503cd1400>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x7f0504a1d7f0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f0504a1de80>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f0504a1dfd0>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f0504a1e120>, <litellm._service_logger.ServiceLogging object at 0x7f050490c550>]
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Final returned optional params: {'extra_body': {}}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {'extra_body': {}}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): {'no-cache': True}
15:43:37 - LiteLLM:DEBUG: caching_handler.py:212 - CACHE RESULT: None
15:43:37 - LiteLLM:INFO: utils.py:2894 - 
LiteLLM completion() model= gpt-4o-mini; provider = azure
15:43:37 - LiteLLM:DEBUG: utils.py:2897 - 
LiteLLM: Params passed to completion() {'model': 'gpt-4o-mini', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'azure', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': '2024-10-21', 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'test from litellm'}]}
15:43:37 - LiteLLM:DEBUG: utils.py:2900 - 
LiteLLM: Non-Default params passed to completion() {}
15:43:37 - LiteLLM:DEBUG: utils.py:3500 - Azure optional params - api_version: api_version=2024-10-21, litellm.api_version=None, os.environ['AZURE_API_VERSION']=2024-07-01-preview
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Final returned optional params: {'extra_body': {}}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {'extra_body': {}}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:634 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://mlopstools-oai-eus1-dev-001.openai.azure.com//openai/deployments/gpt-4o/ \
-H 'api_key: *****' -H 'azure_ad_token: *****' \
-d '{'model': 'gpt-4o', 'messages': [{'role': 'user', 'content': 'test from litellm'}], 'extra_body': {}}'


15:43:37 - LiteLLM:DEBUG: litellm_logging.py:634 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://mlopstools-oai-eus1-dev-001.openai.azure.com//openai/deployments/gpt-4o-mini/ \
-H 'api_key: *****' -H 'azure_ad_token: *****' \
-d '{'model': 'gpt-4o-mini', 'messages': [{'role': 'user', 'content': 'test from litellm'}], 'extra_body': {}}'


^CINFO:     Shutting down
15:43:38 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
{"id": "chatcmpl-AxFNJj0u7pJJPAniL1KX1dS0xaT0S", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "It seems like you might be referring to testing a model from the `litellm` library or framework. However, I don\u2019t have direct access to external libraries or tests. If you have specific code or concepts you want to discuss or test with `litellm`, feel free to share, and I'll do my best to help you!", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null}, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "protected_material_code": {"filtered": false, "detected": false}, "protected_material_text": {"filtered": false, "detected": false}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}], "created": 1738683817, "model": "gpt-4o-mini-2024-07-18", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_f3927aa00d", "usage": {"completion_tokens": 69, "prompt_tokens": 12, "total_tokens": 81, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}, "prompt_filter_results": [{"prompt_index": 0, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "jailbreak": {"filtered": false, "detected": false}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}]}


15:43:38 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:38 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 4.752e-05
15:43:38 - LiteLLM:DEBUG: utils.py:298 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f0501363a80>>
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:38 - LiteLLM:DEBUG: utils.py:298 - Logging Details LiteLLM-Async Success Call, cache_hit=None
15:43:38 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 4.752e-05
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:38 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 4.752e-05
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o-mini; cost=4.752e-05
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:38 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:38 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:38 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.60.0.dev4

Twitter / LinkedIn details

No response

@jairajc
Copy link

jairajc commented Feb 11, 2025

Hi, I'd like to work on this issue. Assigning it to myself.

Jai Raj Choudhary
Date: 02/10/2025

jairajc added a commit to jairajc/litellm that referenced this issue Feb 11, 2025
@jairajc jairajc linked a pull request Feb 11, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants