[Bug]: Loop when background_health_checks is set to true #8248

luismarquezgft · 2025-02-04T15:47:05Z

What happened?

When I set background_health_checks to true in this configuration file and run the docker container, it enters a loop checking the models. See the relevant log output for more information.

docker run     -v $(pwd)/config.yaml:/app/config.yaml     --env-file .env     -p 4000:4000     ghcr.io/berriai/litellm:main-latest     --config /app/config.yaml --detailed_debug

config.yaml

model_list:
  - model_name: os.environ/AZURE_OPENAI_MODEL
    litellm_params:
      model: os.environ/AZURE_OPENAI_DEPLOYMENT
      api_base: os.environ/AZURE_OPENAI_API_BASE
      api_key: "os.environ/AZURE_OPENAI_API_KEY"
      api_version: "os.environ/AZURE_OPENAI_API_VERSION" # [OPTIONAL] litellm uses the latest azure api_version by default
      rpm: 50
    model_info:
      mode: completion # This setting is used to determine how to check the health of the model. See https://docs.litellm.ai/docs/proxy/health
      input_cost_per_token: 0.000002399240
      output_cost_per_token: 0.000009597000
      max_tokens: 16384
  - model_name: gpt-4o-mini
    litellm_params:
      model: azure/gpt-4o-mini
      api_base: os.environ/AZURE_OPENAI_API_BASE
      api_key: "os.environ/AZURE_OPENAI_API_KEY"
      api_version: "os.environ/AZURE_OPENAI_API_VERSION" # [OPTIONAL] litellm uses the latest azure api_version by default
      rpm: 50
    model_info:
      mode: completion
      input_cost_per_token: 0.000000143960
      output_cost_per_token: 0.000000575900
      max_tokens: 16384      

litellm_settings:
    ssl_verify: true   # [OPTIONAL] Set to False to disable SSL verification (not recommended)
    request_timeout: 30 # (int) llm requesttimeout in seconds. Raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 
etectionredaction

general_settings:

  # Parallelism
  max_parallel_requests: 5  # the max parallel requests allowed per deployment 
  global_max_parallel_requests: 100  # the max parallel requests allowed on the proxy all up 

  # Health and monitorization. https://docs.litellm.ai/docs/proxy/health
  background_health_checks: true # Uses model_info.mode to determine how to check the health of each model
  health_check_interval: 300 # frequency of background health checks
  health_check_details: false # If false, hides health check details (e.g. remaining rate limit)

Relevant log output

15:43:35 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:35 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:35 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:35 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:35 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:35 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 2.8380000000000003e-05
15:43:35 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o-mini; cost=2.8380000000000003e-05
15:43:35 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:35 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:35 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:35 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:35 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:35 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'
15:43:36 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
{"id": "chatcmpl-AxFNHjFAjyjVppwfVIkf9iIhhSh4R", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "It looks like you are referring to LiteLLM, which is a lightweight and efficient implementation for LLM-related tasks. However, without specific context, it's a bit unclear what exactly you're testing for.\n\nAre you looking to:\n1. Test the performance of LiteLLM on specific tasks?\n2. Check its compatibility with your project?\n3. Benchmark it against other LLM implementations?\n\nPlease provide more details so I can assist you better!", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null}, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}], "created": 1738683815, "model": "gpt-4o-2024-05-13", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_65792305e4", "usage": {"completion_tokens": 88, "prompt_tokens": 12, "total_tokens": 100, "completion_tokens_details": null, "prompt_tokens_details": null}, "prompt_filter_results": [{"prompt_index": 0, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}]}


15:43:36 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:36 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00138
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f0501360b00>>
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Logging Details LiteLLM-Async Success Call, cache_hit=None
15:43:36 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00138
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:36 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00138
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o; cost=0.00138
15:43:36 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:36 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:36 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:36 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:36 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:36 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'
15:43:36 - LiteLLM:DEBUG: utils.py:298 - 

15:43:36 - LiteLLM:DEBUG: utils.py:298 - Request to litellm:
15:43:36 - LiteLLM:DEBUG: utils.py:298 - litellm.atext_completion(rpm=50, api_key='c36fcccfb77e4b60bca131212fec5b58', api_base='https://mlopstools-oai-eus1-dev-001.openai.azure.com/', api_version='2024-10-21', use_in_pass_through=False, model='azure/gpt-4o', cache={'no-cache': True}, prompt='test from litellm')
15:43:36 - LiteLLM:DEBUG: utils.py:298 - 

15:43:36 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f0503cd1400>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x7f0504a1d7f0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f0504a1de80>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f0504a1dfd0>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f0504a1e120>, <litellm._service_logger.ServiceLogging object at 0x7f050490c550>]
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {}
15:43:36 - LiteLLM:DEBUG: utils.py:298 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): {'no-cache': True}
15:43:36 - LiteLLM:DEBUG: caching_handler.py:212 - CACHE RESULT: None
15:43:36 - LiteLLM:DEBUG: utils.py:298 - 

15:43:36 - LiteLLM:DEBUG: utils.py:298 - Request to litellm:
15:43:36 - LiteLLM:DEBUG: utils.py:298 - litellm.atext_completion(rpm=50, api_key='c36fcccfb77e4b60bca131212fec5b58', api_base='https://mlopstools-oai-eus1-dev-001.openai.azure.com/', api_version='2024-10-21', use_in_pass_through=False, model='azure/gpt-4o-mini', cache={'no-cache': True}, prompt='test from litellm')
15:43:36 - LiteLLM:DEBUG: utils.py:298 - 

15:43:36 - LiteLLM:INFO: utils.py:2894 - 
LiteLLM completion() model= gpt-4o; provider = azure
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f0503cd1400>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x7f0504a1d7f0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f0504a1de80>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f0504a1dfd0>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f0504a1e120>, <litellm._service_logger.ServiceLogging object at 0x7f050490c550>]
15:43:36 - LiteLLM:DEBUG: utils.py:2897 - 
LiteLLM: Params passed to completion() {'model': 'gpt-4o', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'azure', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': '2024-10-21', 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'test from litellm'}]}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {}
15:43:36 - LiteLLM:DEBUG: utils.py:2900 - 
LiteLLM: Non-Default params passed to completion() {}
15:43:36 - LiteLLM:DEBUG: utils.py:298 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): {'no-cache': True}
15:43:36 - LiteLLM:DEBUG: utils.py:3500 - Azure optional params - api_version: api_version=2024-10-21, litellm.api_version=None, os.environ['AZURE_API_VERSION']=2024-07-01-preview
15:43:36 - LiteLLM:DEBUG: caching_handler.py:212 - CACHE RESULT: None
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Final returned optional params: {'extra_body': {}}
15:43:36 - LiteLLM:INFO: utils.py:2894 - 
LiteLLM completion() model= gpt-4o-mini; provider = azure
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {'extra_body': {}}
15:43:36 - LiteLLM:DEBUG: utils.py:2897 - 
LiteLLM: Params passed to completion() {'model': 'gpt-4o-mini', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'azure', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': '2024-10-21', 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'test from litellm'}]}
15:43:36 - LiteLLM:DEBUG: utils.py:2900 - 
LiteLLM: Non-Default params passed to completion() {}
15:43:36 - LiteLLM:DEBUG: utils.py:3500 - Azure optional params - api_version: api_version=2024-10-21, litellm.api_version=None, os.environ['AZURE_API_VERSION']=2024-07-01-preview
15:43:36 - LiteLLM:DEBUG: utils.py:298 - Final returned optional params: {'extra_body': {}}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {'extra_body': {}}
15:43:36 - LiteLLM:DEBUG: litellm_logging.py:634 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://mlopstools-oai-eus1-dev-001.openai.azure.com//openai/deployments/gpt-4o/ \
-H 'api_key: *****' -H 'azure_ad_token: *****' \
-d '{'model': 'gpt-4o', 'messages': [{'role': 'user', 'content': 'test from litellm'}], 'extra_body': {}}'


15:43:36 - LiteLLM:DEBUG: litellm_logging.py:634 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://mlopstools-oai-eus1-dev-001.openai.azure.com//openai/deployments/gpt-4o-mini/ \
-H 'api_key: *****' -H 'azure_ad_token: *****' \
-d '{'model': 'gpt-4o-mini', 'messages': [{'role': 'user', 'content': 'test from litellm'}], 'extra_body': {}}'


15:43:37 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
{"id": "chatcmpl-AxFNIe7ozB0AiTI8dXYoFVvwJAzwE", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "It looks like you might be testing a system or tool related to \"litellm.\" How can I assist you with that? If you have specific questions or need information, feel free to ask!", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null}, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}], "created": 1738683816, "model": "gpt-4o-mini-2024-07-18", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_f3927aa00d", "usage": {"completion_tokens": 40, "prompt_tokens": 12, "total_tokens": 52, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}, "prompt_filter_results": [{"prompt_index": 0, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "jailbreak": {"filtered": false, "detected": false}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}]}


15:43:37 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 2.8380000000000003e-05
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f0501363e10>>
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Logging Details LiteLLM-Async Success Call, cache_hit=None
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 2.8380000000000003e-05
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 2.8380000000000003e-05
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o-mini; cost=2.8380000000000003e-05
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:37 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:37 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:37 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'
15:43:37 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
{"id": "chatcmpl-AxFNIvr5aOtwHwyD3HvHFBHdxfyXe", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "It seems like you might want to test a functionality or have a question about \"litellm\". Could you please provide more context or clarify your request? I'd be happy to help with whatever information or assistance you need.", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null}, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}], "created": 1738683816, "model": "gpt-4o-2024-05-13", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_f3927aa00d", "usage": {"completion_tokens": 44, "prompt_tokens": 12, "total_tokens": 56, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}, "prompt_filter_results": [{"prompt_index": 0, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}]}


15:43:37 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00072
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f0503d23bb0>>
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Logging Details LiteLLM-Async Success Call, cache_hit=None
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00072
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:37 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-2024-05-13
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 0.00072
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o; cost=0.00072
15:43:37 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-2024-05-13', 'combined_model_name': 'azure/gpt-4o-2024-05-13', 'stripped_model_name': 'gpt-4o-2024-05-13', 'combined_stripped_model_name': 'azure/gpt-4o-2024-05-13', 'custom_llm_provider': 'azure'}
15:43:37 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-2024-05-13', 'max_tokens': 4096, 'max_input_tokens': 128000, 'max_output_tokens': 4096, 'input_cost_per_token': 5e-06, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': None, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 1.5e-05, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:37 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:37 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:37 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:37 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'
15:43:37 - LiteLLM:DEBUG: utils.py:298 - 

15:43:37 - LiteLLM:DEBUG: utils.py:298 - Request to litellm:
15:43:37 - LiteLLM:DEBUG: utils.py:298 - litellm.atext_completion(rpm=50, api_key='c36fcccfb77e4b60bca131212fec5b58', api_base='https://mlopstools-oai-eus1-dev-001.openai.azure.com/', api_version='2024-10-21', use_in_pass_through=False, model='azure/gpt-4o', cache={'no-cache': True}, prompt='test from litellm')
15:43:37 - LiteLLM:DEBUG: utils.py:298 - 

15:43:37 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f0503cd1400>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x7f0504a1d7f0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f0504a1de80>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f0504a1dfd0>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f0504a1e120>, <litellm._service_logger.ServiceLogging object at 0x7f050490c550>]
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): {'no-cache': True}
15:43:37 - LiteLLM:DEBUG: caching_handler.py:212 - CACHE RESULT: None
15:43:37 - LiteLLM:DEBUG: utils.py:298 - 

15:43:37 - LiteLLM:INFO: utils.py:2894 - 
LiteLLM completion() model= gpt-4o; provider = azure
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Request to litellm:
15:43:37 - LiteLLM:DEBUG: utils.py:2897 - 
LiteLLM: Params passed to completion() {'model': 'gpt-4o', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'azure', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': '2024-10-21', 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'test from litellm'}]}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - litellm.atext_completion(rpm=50, api_key='c36fcccfb77e4b60bca131212fec5b58', api_base='https://mlopstools-oai-eus1-dev-001.openai.azure.com/', api_version='2024-10-21', use_in_pass_through=False, model='azure/gpt-4o-mini', cache={'no-cache': True}, prompt='test from litellm')
15:43:37 - LiteLLM:DEBUG: utils.py:2900 - 
LiteLLM: Non-Default params passed to completion() {}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - 

15:43:37 - LiteLLM:DEBUG: utils.py:3500 - Azure optional params - api_version: api_version=2024-10-21, litellm.api_version=None, os.environ['AZURE_API_VERSION']=2024-07-01-preview
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f0503cd1400>>, <litellm.proxy.hooks.model_max_budget_limiter._PROXY_VirtualKeyModelMaxBudgetLimiter object at 0x7f0504a1d7f0>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f0504a1de80>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f0504a1dfd0>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f0504a1e120>, <litellm._service_logger.ServiceLogging object at 0x7f050490c550>]
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Final returned optional params: {'extra_body': {}}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {'extra_body': {}}
15:43:37 - LiteLLM:DEBUG: utils.py:298 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): {'no-cache': True}
15:43:37 - LiteLLM:DEBUG: caching_handler.py:212 - CACHE RESULT: None
15:43:37 - LiteLLM:INFO: utils.py:2894 - 
LiteLLM completion() model= gpt-4o-mini; provider = azure
15:43:37 - LiteLLM:DEBUG: utils.py:2897 - 
LiteLLM: Params passed to completion() {'model': 'gpt-4o-mini', 'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stream_options': None, 'stop': None, 'max_tokens': None, 'max_completion_tokens': None, 'modalities': None, 'prediction': None, 'audio': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'custom_llm_provider': 'azure', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': '2024-10-21', 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'messages': [{'role': 'user', 'content': 'test from litellm'}]}
15:43:37 - LiteLLM:DEBUG: utils.py:2900 - 
LiteLLM: Non-Default params passed to completion() {}
15:43:37 - LiteLLM:DEBUG: utils.py:3500 - Azure optional params - api_version: api_version=2024-10-21, litellm.api_version=None, os.environ['AZURE_API_VERSION']=2024-07-01-preview
15:43:37 - LiteLLM:DEBUG: utils.py:298 - Final returned optional params: {'extra_body': {}}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:377 - self.optional_params: {'extra_body': {}}
15:43:37 - LiteLLM:DEBUG: litellm_logging.py:634 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://mlopstools-oai-eus1-dev-001.openai.azure.com//openai/deployments/gpt-4o/ \
-H 'api_key: *****' -H 'azure_ad_token: *****' \
-d '{'model': 'gpt-4o', 'messages': [{'role': 'user', 'content': 'test from litellm'}], 'extra_body': {}}'


15:43:37 - LiteLLM:DEBUG: litellm_logging.py:634 - 

POST Request Sent from LiteLLM:
curl -X POST \
https://mlopstools-oai-eus1-dev-001.openai.azure.com//openai/deployments/gpt-4o-mini/ \
-H 'api_key: *****' -H 'azure_ad_token: *****' \
-d '{'model': 'gpt-4o-mini', 'messages': [{'role': 'user', 'content': 'test from litellm'}], 'extra_body': {}}'


^CINFO:     Shutting down
15:43:38 - LiteLLM:DEBUG: utils.py:298 - RAW RESPONSE:
{"id": "chatcmpl-AxFNJj0u7pJJPAniL1KX1dS0xaT0S", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "It seems like you might be referring to testing a model from the `litellm` library or framework. However, I don\u2019t have direct access to external libraries or tests. If you have specific code or concepts you want to discuss or test with `litellm`, feel free to share, and I'll do my best to help you!", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": null}, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "protected_material_code": {"filtered": false, "detected": false}, "protected_material_text": {"filtered": false, "detected": false}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}], "created": 1738683817, "model": "gpt-4o-mini-2024-07-18", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_f3927aa00d", "usage": {"completion_tokens": 69, "prompt_tokens": 12, "total_tokens": 81, "completion_tokens_details": {"accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0}, "prompt_tokens_details": {"audio_tokens": 0, "cached_tokens": 0}}, "prompt_filter_results": [{"prompt_index": 0, "content_filter_results": {"hate": {"filtered": false, "severity": "safe"}, "jailbreak": {"filtered": false, "detected": false}, "self_harm": {"filtered": false, "severity": "safe"}, "sexual": {"filtered": false, "severity": "safe"}, "violence": {"filtered": false, "severity": "safe"}}}]}


15:43:38 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:38 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 4.752e-05
15:43:38 - LiteLLM:DEBUG: utils.py:298 - Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f0501363a80>>
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:2194 - Filtered callbacks: []
15:43:38 - LiteLLM:DEBUG: utils.py:298 - Logging Details LiteLLM-Async Success Call, cache_hit=None
15:43:38 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 4.752e-05
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM:DEBUG: utils.py:298 - Async success callbacks: Got a complete streaming response
15:43:38 - LiteLLM:DEBUG: cost_calculator.py:563 - completion_response _select_model_name_for_cost_calc: azure/gpt-4o-mini-2024-07-18
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:846 - response_cost: 4.752e-05
15:43:38 - LiteLLM:DEBUG: litellm_logging.py:1566 - Model=gpt-4o-mini; cost=4.752e-05
15:43:38 - LiteLLM:DEBUG: utils.py:4164 - checking potential_model_names in litellm.model_cost: {'split_model': 'gpt-4o-mini-2024-07-18', 'combined_model_name': 'azure/gpt-4o-mini-2024-07-18', 'stripped_model_name': 'gpt-4o-mini-2024-07-18', 'combined_stripped_model_name': 'azure/gpt-4o-mini-2024-07-18', 'custom_llm_provider': 'azure'}
15:43:38 - LiteLLM:DEBUG: utils.py:4439 - model_info: {'key': 'azure/gpt-4o-mini-2024-07-18', 'max_tokens': 16384, 'max_input_tokens': 128000, 'max_output_tokens': 16384, 'input_cost_per_token': 1.65e-07, 'cache_creation_input_token_cost': None, 'cache_read_input_token_cost': 7.5e-08, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'input_cost_per_query': None, 'input_cost_per_second': None, 'input_cost_per_audio_token': None, 'output_cost_per_token': 6.6e-07, 'output_cost_per_audio_token': None, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_cost_per_second': None, 'output_cost_per_image': None, 'output_vector_size': None, 'litellm_provider': 'azure', 'mode': 'chat', 'supports_system_messages': None, 'supports_response_schema': True, 'supports_vision': True, 'supports_function_calling': True, 'supports_assistant_prefill': False, 'supports_prompt_caching': True, 'supports_audio_input': False, 'supports_audio_output': False, 'supports_pdf_input': False, 'supports_embedding_image_input': False, 'supports_native_streaming': None, 'tpm': None, 'rpm': None}
15:43:38 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:151 - in RouterBudgetLimiting.async_log_success_event
15:43:38 - LiteLLM Proxy:DEBUG: model_max_budget_limiter.py:167 - Not running _PROXY_VirtualKeyModelMaxBudgetLimiter.async_log_success_event because user_api_key_model_max_budget is None or empty. `user_api_key_model_max_budget`=None
15:43:38 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
15:43:38 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:48 - 'user_api_key'

Are you a ML Ops Team?

Yes

What LiteLLM version are you on ?

v1.60.0.dev4

Twitter / LinkedIn details

No response

jairajc · 2025-02-11T05:55:04Z

Hi, I'd like to work on this issue. Assigning it to myself.

Jai Raj Choudhary
Date: 02/10/2025

luismarquezgft added the bug Something isn't working label Feb 4, 2025

github-actions bot added the mlops user request label Feb 4, 2025

krrishdholakia assigned ishaan-jaff Feb 5, 2025

krrishdholakia added the high priority label Feb 5, 2025

ishaan-jaff removed their assignment Feb 7, 2025

ishaan-jaff added service availability feb 2025 labels Feb 7, 2025

spirillen mentioned this issue Feb 8, 2025

model-18.com mypdns/matrix#81542

Open

jairajc added a commit to jairajc/litellm that referenced this issue Feb 11, 2025

Fix background_health_check loop issue (BerriAI#8248)

1c9308f

jairajc linked a pull request Feb 11, 2025 that will close this issue

Fix background health check loop #8454

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Loop when background_health_checks is set to true #8248

[Bug]: Loop when background_health_checks is set to true #8248

luismarquezgft commented Feb 4, 2025

jairajc commented Feb 11, 2025

[Bug]: Loop when background_health_checks is set to true #8248

[Bug]: Loop when background_health_checks is set to true #8248

Comments

luismarquezgft commented Feb 4, 2025

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

jairajc commented Feb 11, 2025