Merge branch 'main' into main

lunary-ai · Mar 21, 2024 · 29e8c14 · 29e8c14
2 parents 1cbfd31 + bcd6203
commit 29e8c14
Show file tree

Hide file tree

Showing 68 changed files with 2,239 additions and 765 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -46,6 +46,7 @@ jobs:
             pip install "apscheduler==3.10.4"
             pip install "PyGithub==1.59.1"
             pip install argon2-cffi
+            pip install "pytest-mock==3.12.0"
             pip install python-multipart
       - save_cache:
           paths:
@@ -148,6 +149,7 @@ jobs:
             python -m pip install --upgrade pip
             python -m pip install -r .circleci/requirements.txt
             pip install "pytest==7.3.1"
+            pip install "pytest-mock==3.12.0"
             pip install "pytest-asyncio==0.21.1"
             pip install mypy
             pip install "google-generativeai>=0.3.2"

diff --git a/Dockerfile b/Dockerfile
@@ -38,6 +38,11 @@ RUN pip wheel --no-cache-dir --wheel-dir=/wheels/ -r requirements.txt
 # install semantic-cache [Experimental]- we need this here and not in requirements.txt because redisvl pins to pydantic 1.0 
 RUN pip install redisvl==0.0.7 --no-deps
 
+# ensure pyjwt is used, not jwt
+RUN pip uninstall jwt -y
+RUN pip uninstall PyJWT -y
+RUN pip install PyJWT --no-cache-dir
+
 # Build Admin UI
 RUN chmod +x build_admin_ui.sh && ./build_admin_ui.sh
 

diff --git a/Dockerfile.database b/Dockerfile.database
@@ -53,6 +53,11 @@ RUN pip install *.whl /wheels/* --no-index --find-links=/wheels/ && rm -f *.whl
 # install semantic-cache [Experimental]- we need this here and not in requirements.txt because redisvl pins to pydantic 1.0 
 RUN pip install redisvl==0.0.7 --no-deps
 
+# ensure pyjwt is used, not jwt
+RUN pip uninstall jwt -y
+RUN pip uninstall PyJWT -y
+RUN pip install PyJWT --no-cache-dir
+
 # Build Admin UI
 RUN chmod +x build_admin_ui.sh && ./build_admin_ui.sh
 

diff --git a/docs/my-website/docs/providers/openai_compatible.md b/docs/my-website/docs/providers/openai_compatible.md
@@ -1,3 +1,6 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
 # OpenAI-Compatible Endpoints
 
 To call models hosted behind an openai proxy, make 2 changes:
@@ -39,4 +42,74 @@ response = litellm.embedding(
     input=["good morning from litellm"]
 )
 print(response)
-```
+```
+
+
+
+## Usage with LiteLLM Proxy Server
+
+Here's how to call an OpenAI-Compatible Endpoint with the LiteLLM Proxy Server
+
+1. Modify the config.yaml 
+
+  ```yaml
+  model_list:
+    - model_name: my-model
+      litellm_params:
+        model: openai/<your-model-name>  # add openai/ prefix to route as OpenAI provider
+        api_base: <model-api-base>       # add api base for OpenAI compatible provider
+        api_key: api-key                 # api key to send your model
+  ```
+
+2. Start the proxy 
+
+  ```bash
+  $ litellm --config /path/to/config.yaml
+  ```
+
+3. Send Request to LiteLLM Proxy Server
+
+  <Tabs>
+
+  <TabItem value="openai" label="OpenAI Python v1.0.0+">
+
+  ```python
+  import openai
+  client = openai.OpenAI(
+      api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
+      base_url="http://0.0.0.0:4000" # litellm-proxy-base url
+  )
+
+  response = client.chat.completions.create(
+      model="my-model",
+      messages = [
+          {
+              "role": "user",
+              "content": "what llm are you"
+          }
+      ],
+  )
+
+  print(response)
+  ```
+  </TabItem>
+
+  <TabItem value="curl" label="curl">
+
+  ```shell
+  curl --location 'http://0.0.0.0:4000/chat/completions' \
+      --header 'Authorization: Bearer sk-1234' \
+      --header 'Content-Type: application/json' \
+      --data '{
+      "model": "my-model",
+      "messages": [
+          {
+          "role": "user",
+          "content": "what llm are you"
+          }
+      ],
+  }'
+  ```
+  </TabItem>
+
+  </Tabs>
diff --git a/docs/my-website/docs/providers/vertex.md b/docs/my-website/docs/providers/vertex.md
@@ -23,58 +23,105 @@ litellm.vertex_location = "us-central1"  # proj location
 response = litellm.completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])
 ```
 
-## OpenAI Proxy Usage 
+## Usage with LiteLLM Proxy Server
 
 Here's how to use Vertex AI with the LiteLLM Proxy Server
 
 1. Modify the config.yaml 
 
-<Tabs>
+  <Tabs>
 
-<TabItem value="completion_param" label="Different location per model">
-
-Use this when you need to set a different location for each vertex model
-
-```yaml
-model_list:
-  - model_name: gemini-vision
-    litellm_params:
-      model: vertex_ai/gemini-1.0-pro-vision-001
-      vertex_project: "project-id"
-      vertex_location: "us-central1"
-  - model_name: gemini-vision
-    litellm_params:
-      model: vertex_ai/gemini-1.0-pro-vision-001
-      vertex_project: "project-id2"
-      vertex_location: "us-east"
-```
+  <TabItem value="completion_param" label="Different location per model">
 
-</TabItem>
+  Use this when you need to set a different location for each vertex model
 
-<TabItem value="litellm_param" label="One location all vertex models">
+  ```yaml
+  model_list:
+    - model_name: gemini-vision
+      litellm_params:
+        model: vertex_ai/gemini-1.0-pro-vision-001
+        vertex_project: "project-id"
+        vertex_location: "us-central1"
+    - model_name: gemini-vision
+      litellm_params:
+        model: vertex_ai/gemini-1.0-pro-vision-001
+        vertex_project: "project-id2"
+        vertex_location: "us-east"
+  ```
 
-Use this when you have one vertex location for all models
+  </TabItem>
 
-```yaml
-litellm_settings: 
-  vertex_project: "hardy-device-38811" # Your Project ID
-  vertex_location: "us-central1" # proj location
+  <TabItem value="litellm_param" label="One location all vertex models">
 
-model_list: 
-  -model_name: team1-gemini-pro
-   litellm_params: 
-     model: gemini-pro
-```
+  Use this when you have one vertex location for all models
 
-</TabItem>
+  ```yaml
+  litellm_settings: 
+    vertex_project: "hardy-device-38811" # Your Project ID
+    vertex_location: "us-central1" # proj location
 
-</Tabs>
+  model_list: 
+    -model_name: team1-gemini-pro
+    litellm_params: 
+      model: gemini-pro
+  ```
+
+  </TabItem>
+
+  </Tabs>
 
 2. Start the proxy 
 
-```bash
-$ litellm --config /path/to/config.yaml
-```
+  ```bash
+  $ litellm --config /path/to/config.yaml
+  ```
+
+3. Send Request to LiteLLM Proxy Server
+
+  <Tabs>
+
+  <TabItem value="openai" label="OpenAI Python v1.0.0+">
+
+  ```python
+  import openai
+  client = openai.OpenAI(
+      api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
+      base_url="http://0.0.0.0:4000" # litellm-proxy-base url
+  )
+
+  response = client.chat.completions.create(
+      model="team1-gemini-pro",
+      messages = [
+          {
+              "role": "user",
+              "content": "what llm are you"
+          }
+      ],
+  )
+
+  print(response)
+  ```
+  </TabItem>
+
+  <TabItem value="curl" label="curl">
+
+  ```shell
+  curl --location 'http://0.0.0.0:4000/chat/completions' \
+      --header 'Authorization: Bearer sk-1234' \
+      --header 'Content-Type: application/json' \
+      --data '{
+      "model": "team1-gemini-pro",
+      "messages": [
+          {
+          "role": "user",
+          "content": "what llm are you"
+          }
+      ],
+  }'
+  ```
+  </TabItem>
+
+  </Tabs>
 
 ## Set Vertex Project & Vertex Location
 All calls using Vertex AI require the following parameters:

diff --git a/docs/my-website/docs/proxy/caching.md b/docs/my-website/docs/proxy/caching.md
@@ -201,6 +201,35 @@ curl --location 'http://0.0.0.0:4000/embeddings' \
 </TabItem>
 </Tabs>
 
+## Debugging Caching - `/cache/ping`
+LiteLLM Proxy exposes a `/cache/ping` endpoint to test if the cache is working as expected
+
+**Usage**
+```shell
+curl --location 'http://0.0.0.0:4000/cache/ping'  -H "Authorization: Bearer sk-1234"
+```
+
+**Expected Response - when cache healthy**
+```shell
+{
+    "status": "healthy",
+    "cache_type": "redis",
+    "ping_response": true,
+    "set_cache_response": "success",
+    "litellm_cache_params": {
+        "supported_call_types": "['completion', 'acompletion', 'embedding', 'aembedding', 'atranscription', 'transcription']",
+        "type": "redis",
+        "namespace": "None"
+    },
+    "redis_cache_params": {
+        "redis_client": "Redis<ConnectionPool<Connection<host=redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com,port=16337,db=0>>>",
+        "redis_kwargs": "{'url': 'redis://:******@redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com:16337'}",
+        "async_redis_conn_pool": "BlockingConnectionPool<Connection<host=redis-16337.c322.us-east-1-2.ec2.cloud.redislabs.com,port=16337,db=0>>",
+        "redis_version": "7.2.0"
+    }
+}
+```
+
 ## Advanced
 ### Set Cache Params on config.yaml
 ```yaml

diff --git a/docs/my-website/docs/proxy/configs.md b/docs/my-website/docs/proxy/configs.md
@@ -246,6 +246,10 @@ $ litellm --config /path/to/config.yaml
 
 ## Load Balancing 
 
+:::info
+For more on this, go to [this page](./load_balancing.md)
+:::
+
 Use this to call multiple instances of the same model and configure things like [routing strategy](../routing.md#advanced). 
 
 For optimal performance:
@@ -306,25 +310,6 @@ router_settings: # router_settings are optional
   redis_port: 1992
 ```
 
-## Set Azure `base_model` for cost tracking
-
-**Problem**: Azure returns `gpt-4` in the response when `azure/gpt-4-1106-preview` is used. This leads to inaccurate cost tracking
-
-**Solution** ✅ :  Set `base_model` on your config so litellm uses the correct model for calculating azure cost
-
-Example config with `base_model`
-```yaml
-model_list:
-  - model_name: azure-gpt-3.5
-    litellm_params:
-      model: azure/chatgpt-v-2
-      api_base: os.environ/AZURE_API_BASE
-      api_key: os.environ/AZURE_API_KEY
-      api_version: "2023-07-01-preview"
-    model_info:
-      base_model: azure/gpt-4-1106-preview
-```
-
 You can view your cost once you set up [Virtual keys](https://docs.litellm.ai/docs/proxy/virtual_keys) or [custom_callbacks](https://docs.litellm.ai/docs/proxy/logging)
 
 ## Load API Keys
@@ -605,6 +590,9 @@ general_settings:
   "litellm_settings": {}, # ALL (https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py)
   "general_settings": {
     "completion_model": "string",
+    "disable_spend_logs": "boolean", # turn off writing each transaction to the db
+    "disable_reset_budget": "boolean", # turn off reset budget scheduled task
+    "enable_jwt_auth": "boolean", # allow proxy admin to auth in via jwt tokens with 'litellm_proxy_admin' in claims 
     "key_management_system": "google_kms", # either google_kms or azure_kms
     "master_key": "string",
     "database_url": "string",

diff --git a/docs/my-website/docs/proxy/cost_tracking.md b/docs/my-website/docs/proxy/cost_tracking.md
@@ -15,4 +15,25 @@ model_list:
         base_model: dall-e-3 # 👈 set dall-e-3 as base model
     model_info:
         mode: image_generation
+```
+
+## Chat Completions / Embeddings
+
+**Problem**: Azure returns `gpt-4` in the response when `azure/gpt-4-1106-preview` is used. This leads to inaccurate cost tracking
+
+**Solution** ✅ :  Set `base_model` on your config so litellm uses the correct model for calculating azure cost
+
+Get the base model name from [here](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
+
+Example config with `base_model`
+```yaml
+model_list:
+  - model_name: azure-gpt-3.5
+    litellm_params:
+      model: azure/chatgpt-v-2
+      api_base: os.environ/AZURE_API_BASE
+      api_key: os.environ/AZURE_API_KEY
+      api_version: "2023-07-01-preview"
+    model_info:
+      base_model: azure/gpt-4-1106-preview
 ```